NumPy version of “Exponential weighted moving average”, equivalent to TradingView RMA - python

How do I get exponentially weighted moving average with alpha = 1 / length equivalent to RMA function in TradingView RMA ?
I tried all functions mentioned in NumPy version of "Exponential weighted moving average", equivalent to pandas.ewm().mean() however can't match results to TradingView.
array
src = np.array([4086.29, 4310.01, 4509.08, 4130.37, 3699.99, 3660.02, 4378.48, 4640.0, 5709.99, 5950.02])
with period = 3
Should give results:
array([ nan, nan, 4301.79333333, 4244.65222222,
4063.09814815, 3928.73876543, 4078.65251029, 4265.76834019,
4747.17556013, 5148.12370675])
Any ideas how to achieve it?

I managed to get it matched with the following parameters using pandas:
window = 14
df['mean'] = df['close'].ewm(alpha = 1/window, min_periods = window, adjust = False).mean()

Related

Apply custom weighting in Pandas rolling function [duplicate]

Using pandas I can compute
simple moving average SMA using pandas.stats.moments.rolling_mean
exponential moving average EMA using pandas.stats.moments.ewma
But how do I compute a weighted moving average (WMA) as described in wikipedia http://en.wikipedia.org/wiki/Exponential_smoothing ... using pandas?
Is there a pandas function to compute a WMA?
Using pandas you can calculate a weighted moving average (wma) using:
.rolling() combined with .apply()
Here's an example with 3 weights and window=3:
data = {'colA': random.randint(1, 6, 10)}
df = pd.DataFrame(data)
weights = np.array([0.5, 0.25, 0.25])
sum_weights = np.sum(weights)
df['weighted_ma'] = (df['colA']
.rolling(window=3, center=True)
.apply(lambda x: np.sum(weights*x) / sum_weights, raw=False)
)
Please note that in .rolling() I have used argument center=True.
You should check if this applies with your usecase or whether you need center=False.
No, there is no implementation of that exact algorithm. Created a GitHub issue about it here:
https://github.com/pydata/pandas/issues/886
I'd be happy to take a pull request for this-- implementation should be straightforward Cython coding and can be integrated into pandas.stats.moments
If data is a Pandas DataFrame or Series and you want to compute the WMA over the rows, you can do it using
wma = data[::-1].cumsum().sum() * 2 / data.shape[0] / (data.shape[0] + 1)
If you want a rolling WMA of window length n, use
data.rolling(n).apply(lambda x: x[::-1].cumsum().sum() * 2 / n / (n + 1))
as n = x.shape[0]. Note that this solution might be a bit slower than the one by Sander van den Oord, but you don't have to worry about the weights.
Construct a kernel with the weights, and apply it to your series using numpy.convolve.
import pandas as pd
import numpy as np
def wma(arr, period):
kernel = np.arange(period, 0, -1)
kernel = np.concatenate([np.zeros(period - 1), kernel / kernel.sum()])
return np.convolve(arr, kernel, 'same')
df = pd.DataFrame({'value':np.arange(11)})
df['wma'] = wma(df['value'], 4)
Here I am interpreting WMA according to this page: https://en.wikipedia.org/wiki/Moving_average
For this type of WMA, the weights should be a linear range of n values, adding up to 1.0.
Note that I pad the front of the kernel with zeros. This is because we want a 'one-sided' window function, so that 'future' values in the time series do not affect the moving average.
numpy.convolve is fast, unlike apply()!
You can also use numpy.correlate if you reverse the kernel.

Calculating RSI for recent days

I'm working on improving my algo bot and one thing that I have implemented absolutely awful is RSI. Since RSI is a lagging indicator I can't get recent data, the last date I get a value for is 8 days ago. I'm therefore looking to calculate it somehow by using previous values and looking for ideas on how to do so.
My data points:
[222.19000244140625, nan]
[222.19000244140625, nan]
[215.47000122070312, nan]
[212.25, nan]
[207.97000122070312, nan]
[206.3300018310547, nan]
[205.88999938964844, nan]
[208.36000061035156, nan]
[204.08999633789062, 10.720487433358727]
[197.00999450683594, 7.934105468501102]
[194.6699981689453, 7.224811311424375]
[190.66000366210938, 6.148330770309926]
[191.6300048828125, 9.861218420857213]
[189.13999938964844, 8.835726925023536]
[189.02000427246094, 8.785409465194874]
[187.02000427246094, 7.925663008903896]
[195.69000244140625, 37.989974096922204]
[196.9199981689453, 41.10776671337689]
[194.11000061035156, 36.33757785797855]
As you can see 10.720487433358727 is my most recent value but I'm sure bigger brains than mine can figure out a way to calculate it up until today.
Thanks for your help!
It is important to note that there are various ways of defining the RSI. It is commonly defined in at least two ways: using a simple moving average (SMA) as above, or using an exponential moving average (EMA). Here's a code snippet that calculates both definitions of RSI and plots them for comparison. I'm discarding the first row after taking the difference, since it is always NaN by definition.
import pandas
import pandas_datareader.data as web
import datetime
import matplotlib.pyplot as plt
# Window length for moving average
window_length = 14
# Dates
start = '2020-12-01'
end = '2021-01-27'
# Get data
data = web.DataReader('AAPL', 'yahoo', start, end)
# Get just the adjusted close
close = data['Adj Close']
# Get the difference in price from previous step
delta = close.diff()
# Get rid of the first row, which is NaN since it did not have a previous
# row to calculate the differences
delta = delta[1:]
# Make the positive gains (up) and negative gains (down) Series
up, down = delta.copy(), delta.copy()
up[up < 0] = 0
down[down > 0] = 0
# Calculate the EWMA
roll_up1 = up.ewm(span=window_length).mean()
roll_down1 = down.abs().ewm(span=window_length).mean()
# Calculate the RSI based on EWMA
RS1 = roll_up1 / roll_down1
RSI1 = 100.0 - (100.0 / (1.0 + RS1))
# Calculate the SMA
roll_up2 = up.rolling(window_length).mean()
roll_down2 = down.abs().rolling(window_length).mean()
# Calculate the RSI based on SMA
RS2 = roll_up2 / roll_down2
RSI2 = 100.0 - (100.0 / (1.0 + RS2))
# Compare graphically
plt.figure(figsize=(8, 6))
RSI1.plot()
RSI2.plot()
plt.legend(['RSI via EWMA', 'RSI via SMA'])
plt.show()

Efficient expanding OLS in pandas

I would like to explore the solutions of performing expanding OLS in pandas (or other libraries that accept DataFrame/Series friendly) efficiently.
Assumming the dataset is large, I am NOT interested in any solutions with a for-loop;
I am looking for solutions about expanding rather than rolling. Rolling functions always require a fixed window while expanding uses a variable window (starting from beginning);
Please do not suggest pandas.stats.ols.MovingOLS because it is deprecated;
Please do not suggest other deprecated methods such as expanding_mean.
For example, there is a DataFrame df with two columns X and y. To make it simpler, let's just calculate beta.
Currently, I am thinking about something like
import numpy as np
import pandas as pd
import statsmodels.api as sm
def my_OLS_func(df, y_name, X_name):
y = df[y_name]
X = df[X_name]
X = sm.add_constant(X)
b = np.linalg.pinv(X.T.dot(X)).dot(X.T).dot(y)
return b
df = pd.DataFrame({'X':[1,2.5,3], 'y':[4,5,6.3]})
df['beta'] = df.expanding().apply(my_OLS_func, args = ('y', 'X'))
Expected values of df['beta'] are 0 (or NaN), 0.66666667, and 1.038462.
However, this method does not seem to work because the method seems very inflexible. I am not sure how one could pass the two Series as arguments.
Any suggestions would be appreciated.
One option is to use the RecursiveLS (recursive least squares) model from Statsmodels:
# Simulate some data
rs = np.random.RandomState(seed=12345)
nobs = 100000
beta = [10., -0.2]
sigma2 = 2.5
exog = sm.add_constant(rs.uniform(size=nobs))
eps = rs.normal(scale=sigma2**0.5, size=nobs)
endog = np.dot(exog, beta) + eps
# Construct and fit the recursive least squares model
mod = sm.RecursiveLS(endog, exog)
res = mod.fit()
# This is a 2 x 100,000 numpy array with the regression coefficients
# that would be estimated when using data from the beginning of the
# sample to each point. You should usually ignore the first k=2
# datapoints since they are controlled by a diffuse prior.
res.recursive_coefficients.filtered

how do I compute a weighted moving average using pandas

Using pandas I can compute
simple moving average SMA using pandas.stats.moments.rolling_mean
exponential moving average EMA using pandas.stats.moments.ewma
But how do I compute a weighted moving average (WMA) as described in wikipedia http://en.wikipedia.org/wiki/Exponential_smoothing ... using pandas?
Is there a pandas function to compute a WMA?
Using pandas you can calculate a weighted moving average (wma) using:
.rolling() combined with .apply()
Here's an example with 3 weights and window=3:
data = {'colA': random.randint(1, 6, 10)}
df = pd.DataFrame(data)
weights = np.array([0.5, 0.25, 0.25])
sum_weights = np.sum(weights)
df['weighted_ma'] = (df['colA']
.rolling(window=3, center=True)
.apply(lambda x: np.sum(weights*x) / sum_weights, raw=False)
)
Please note that in .rolling() I have used argument center=True.
You should check if this applies with your usecase or whether you need center=False.
No, there is no implementation of that exact algorithm. Created a GitHub issue about it here:
https://github.com/pydata/pandas/issues/886
I'd be happy to take a pull request for this-- implementation should be straightforward Cython coding and can be integrated into pandas.stats.moments
If data is a Pandas DataFrame or Series and you want to compute the WMA over the rows, you can do it using
wma = data[::-1].cumsum().sum() * 2 / data.shape[0] / (data.shape[0] + 1)
If you want a rolling WMA of window length n, use
data.rolling(n).apply(lambda x: x[::-1].cumsum().sum() * 2 / n / (n + 1))
as n = x.shape[0]. Note that this solution might be a bit slower than the one by Sander van den Oord, but you don't have to worry about the weights.
Construct a kernel with the weights, and apply it to your series using numpy.convolve.
import pandas as pd
import numpy as np
def wma(arr, period):
kernel = np.arange(period, 0, -1)
kernel = np.concatenate([np.zeros(period - 1), kernel / kernel.sum()])
return np.convolve(arr, kernel, 'same')
df = pd.DataFrame({'value':np.arange(11)})
df['wma'] = wma(df['value'], 4)
Here I am interpreting WMA according to this page: https://en.wikipedia.org/wiki/Moving_average
For this type of WMA, the weights should be a linear range of n values, adding up to 1.0.
Note that I pad the front of the kernel with zeros. This is because we want a 'one-sided' window function, so that 'future' values in the time series do not affect the moving average.
numpy.convolve is fast, unlike apply()!
You can also use numpy.correlate if you reverse the kernel.

statistics for histogram of periodic data

For a series of angle values in (-pi, pi) range, I make a histogram. Is there an effective way to calculate a mean and modal (post probable) value? Consider following examples:
import numpy as N, cmath
deg = N.pi/180.
d = N.array([-175., 170, 175, 179, -179])*deg
i = N.sum(N.exp(1j*d))
ave = cmath.phase(i)
i /= float(d.size)
stdev = -2. * N.log(N.sqrt(i.real**2 + i.imag**2))
print ave/deg, stdev/deg
Now, let's have a histogram:
counts, bins = N.histogram(data, N.linspace(-N.pi, N.pi, 360))
Is it possible to calculate mean, mode having counts and bins? For non-periodic data, calculation of a mean is straightforward:
ave = sum(counts*bins[:-1])
Calculations of a modal value requires more effort. Actually, I'm not sure my code below is correct: firstly, I identify bins which occur most frequently and then I calculate an arithmetic mean:
cmax = bins[N.argmax(counts)]
mode = N.mean(N.take(bins, N.nonzero(counts == cmax)[0]))
I have no idea, how to calculate standard deviation from such data, though. One obvious solution to all my problems (at least those described above) is to convert histogram data to a data series and then use it in calculations. This is not elegant, however, and inefficient.
Any hints will be very appreciated.
This is the partial solution I wrote.
import numpy as N, cmath
import scipy.stats as ST
d = [-175, 170.2, 175.57, 179, -179, 170.2, 175.57, 170.2]
deg = N.pi/180.
data = N.array(d)*deg
i = N.sum(N.exp(1j*data))
ave = cmath.phase(i) # correct and exact mean for periodic data
wrong_ave = N.mean(d)
i /= float(data.size)
stdev = -2. * N.log(N.sqrt(i.real**2 + i.imag**2))
wrong_stdev = N.std(d)
bins = N.linspace(-N.pi, N.pi, 360)
counts, bins = N.histogram(data, bins, normed=False)
# consider it weighted vector addition
nz = N.nonzero(counts)[0]
weight = counts[nz]
i = N.sum(weight * N.exp(1j*bins[nz])/len(nz))
pave = cmath.phase(i) # correct and approximated mean for periodic data
i /= sum(weight)/float(len(nz))
pstdev = -2. * N.log(N.sqrt(i.real**2 + i.imag**2))
print
print 'scipy: %12.3f (mean) %12.3f (stdev)' % (ST.circmean(data)/deg, \
ST.circstd(data)/deg)
When run, it gives following results:
mean: 175.840 85.843 175.360
stdev: 0.472 151.785 0.430
scipy: 175.840 (mean) 3.673 (stdev)
A few comments now: the first column gives mean/stdev calculated. As can be seen, the mean agrees well with scipy.stats.circmean (thanks JoeKington for pointing it out). Unfortunately stdev differs. I will look at it later. The second column gives completely wrong results (non-periodic mean/std from numpy obviously does not work here). The 3rd column gives sth I wanted to obtain from the histogram data (#JoeKington: my raw data won't fit memory of my computer.., #dmytro: thanks for your input: of course, bin size will influence result but in my application I don't have much choice, i.e. I have to reduce data somehow). As can be seen, the mean (3rd column) is properly calculated, stdev needs further attention :)
Have a look at scipy.stats.circmean and scipy.stats.circstd.
Or do you only have the histogram counts, and not the "raw" data? If so, you could fit a Von Mises distribution to your histogram counts and approximate the mean and stddev in that way.
Here's how to get an approximation.
Since Var(x) = <x^2> - <x>^2, we have:
meanX = N.sum(counts * bins[:-1]) / N.sum(counts)
meanX2 = N.sum(counts * bins[:-1]**2) / N.sum(counts)
std = N.sqrt(meanX2 - meanX**2)

Categories

Resources