I'm working on improving my algo bot and one thing that I have implemented absolutely awful is RSI. Since RSI is a lagging indicator I can't get recent data, the last date I get a value for is 8 days ago. I'm therefore looking to calculate it somehow by using previous values and looking for ideas on how to do so.
My data points:
[222.19000244140625, nan]
[222.19000244140625, nan]
[215.47000122070312, nan]
[212.25, nan]
[207.97000122070312, nan]
[206.3300018310547, nan]
[205.88999938964844, nan]
[208.36000061035156, nan]
[204.08999633789062, 10.720487433358727]
[197.00999450683594, 7.934105468501102]
[194.6699981689453, 7.224811311424375]
[190.66000366210938, 6.148330770309926]
[191.6300048828125, 9.861218420857213]
[189.13999938964844, 8.835726925023536]
[189.02000427246094, 8.785409465194874]
[187.02000427246094, 7.925663008903896]
[195.69000244140625, 37.989974096922204]
[196.9199981689453, 41.10776671337689]
[194.11000061035156, 36.33757785797855]
As you can see 10.720487433358727 is my most recent value but I'm sure bigger brains than mine can figure out a way to calculate it up until today.
Thanks for your help!
It is important to note that there are various ways of defining the RSI. It is commonly defined in at least two ways: using a simple moving average (SMA) as above, or using an exponential moving average (EMA). Here's a code snippet that calculates both definitions of RSI and plots them for comparison. I'm discarding the first row after taking the difference, since it is always NaN by definition.
import pandas
import pandas_datareader.data as web
import datetime
import matplotlib.pyplot as plt
# Window length for moving average
window_length = 14
# Dates
start = '2020-12-01'
end = '2021-01-27'
# Get data
data = web.DataReader('AAPL', 'yahoo', start, end)
# Get just the adjusted close
close = data['Adj Close']
# Get the difference in price from previous step
delta = close.diff()
# Get rid of the first row, which is NaN since it did not have a previous
# row to calculate the differences
delta = delta[1:]
# Make the positive gains (up) and negative gains (down) Series
up, down = delta.copy(), delta.copy()
up[up < 0] = 0
down[down > 0] = 0
# Calculate the EWMA
roll_up1 = up.ewm(span=window_length).mean()
roll_down1 = down.abs().ewm(span=window_length).mean()
# Calculate the RSI based on EWMA
RS1 = roll_up1 / roll_down1
RSI1 = 100.0 - (100.0 / (1.0 + RS1))
# Calculate the SMA
roll_up2 = up.rolling(window_length).mean()
roll_down2 = down.abs().rolling(window_length).mean()
# Calculate the RSI based on SMA
RS2 = roll_up2 / roll_down2
RSI2 = 100.0 - (100.0 / (1.0 + RS2))
# Compare graphically
plt.figure(figsize=(8, 6))
RSI1.plot()
RSI2.plot()
plt.legend(['RSI via EWMA', 'RSI via SMA'])
plt.show()
Related
How do I get exponentially weighted moving average with alpha = 1 / length equivalent to RMA function in TradingView RMA ?
I tried all functions mentioned in NumPy version of "Exponential weighted moving average", equivalent to pandas.ewm().mean() however can't match results to TradingView.
array
src = np.array([4086.29, 4310.01, 4509.08, 4130.37, 3699.99, 3660.02, 4378.48, 4640.0, 5709.99, 5950.02])
with period = 3
Should give results:
array([ nan, nan, 4301.79333333, 4244.65222222,
4063.09814815, 3928.73876543, 4078.65251029, 4265.76834019,
4747.17556013, 5148.12370675])
Any ideas how to achieve it?
I managed to get it matched with the following parameters using pandas:
window = 14
df['mean'] = df['close'].ewm(alpha = 1/window, min_periods = window, adjust = False).mean()
I'm testing ARCH package to forecast the Variance (Standard Deviation) of two series using GARCH(1,1).
This is the first part of my code
import pandas as pd
import numpy as np
from arch import arch_model
returns = pd.read_csv('ret_full.csv', index_col=0)
returns.index = pd.to_datetime(returns.index)
Ibovespa Returns
The first series is the 1st Future Contract of Ibovespa Index, has an observed annualized volatility really close to the Garch Forecast.
The first problem that I've found is that you need to rescale your sample by 100. To do this, you can multiply your return series by 100 or setting the parameter rescale=True in the arch_model function.
Why is necessary to do this?
# Ibov
ret_ibov = returns['IBOV_1st']
model_ibov = arch_model(ret_ibov, vol='Garch', p=1, o=0, q=1, dist='Normal', rescale=True)
res_ibov = model_ibov.fit()
After fitting the model I forecast the Variance (just 5 steps to illustrate the problem), get the Standard Deviation and annualize it. Obs: Since I had to rescale my return series, I divide my forecast by 10000 (100**2, because of rescale)
# Forecast
forecast_ibov = res_ibov.forecast(horizon=5)
# Getting Annualized Standard Deviation
# Garch Vol
vol_ibov_for = (forecast_ibov.variance.iloc[-1]/10000)**0.5 * np.sqrt(252) * 100
# Observed Vol
vol_ibov = ret_ibov.std() * np.sqrt(252) * 100
And that's the forecast output
vol_ibov_for
h.1 24.563208
h.2 24.543245
h.3 24.523969
h.4 24.505357
h.5 24.487385
Which is really close to Observed Vol 23.76
This is a results that I was expecting.
IRFM Returns
When I do exactly the same process a less volatile series, I got a really weird result.
# IRFM
ret_irfm = returns['IRFM1M']
model_irfm = arch_model(ret_irfm, vol='Garch', p=1, o=0, q=1, dist='Normal', rescale=True)
res_irfm = model_irfm.fit()
# Forecast
forecasts_irfm = res_irfm.forecast(horizon=5)
# Getting Annualized Standard Deviation
# Garch Vol
vol_irfm_for = (forecasts_irfm.variance.iloc[-1]/10000)**0.5 * np.sqrt(252) * 100
# Observed Vol
vol_irfm = ret_irfm.std() * np.sqrt(252) * 100
Forecast output:
vol_irfm_for
h.1 47.879679
h.2 49.322351
h.3 50.519282
h.4 51.517356
h.5 52.352894
And this is significantly different from the Observed Volatility 5.39
Why is this happening? Maybe because of the rescaling? Do I have to do another adjust before the forecast?
Thanks
Found the answer.
The rescale=True is used when the model fails to converge to a result. So rescale could be a solution for the problem. If the model doesn't need rescale, even if the parameter is True, it will not do anything.
Point of Attempion: If the rescale=True and, in fact, rescaled the series. It's necessary to adjust the outputs. In my question I was confused about how high my volatility was. That's because I was assuming that my rescale values was 100, which not necessarily is true.
The correct thing to do is to set the parameter as True and get the rescale value after that.
To do this, just need to insert the following code:
# IRFM
ret_irfm = returns['IRFM1M']
model_irfm = arch_model(ret_irfm, vol='Garch', p=1, o=0, q=1, dist='Normal', rescale=True, mean='Zero')
res_irfm = model_irfm.fit()
scale = res_irfm.scale # New part of the code
# Forecast
forecasts_irfm = res_irfm.forecast(horizon=5)
# Getting Annualized Standard Deviation
# Garch Vol
# New part of the code: Divide variance by scale^2
vol_irfm_for = (forecasts_irfm.variance.iloc[-1] / np.power(scale, 2))**0.5 * np.sqrt(252) * 100
# Observed Vol
vol_irfm = ret_irfm.std() * np.sqrt(252) * 100
Hope this help another users with the same problem. It's a really simple thing.
Thanks.
I am trying to calculate the probability of transmission for an electron through a series of potential wells. When looping through energy values using np.linspace() I get a return of nan for any value under 15. I understand this for values of 0 and 15, since they return a value of zero in the denominator for the k and q values. If I simply call getT(5) for example, I get a real value. However when getT(5) gets called from the loop using np.linspace(0,30,2001) then it returns nan. Shouldnt it return either nan or a value in both cases?
import numpy as np
import matplotlib.pyplot as plt
def getT(Ein):
#constants
hbar=1.055e-34 #J-s
m=9.109e-31 #mass of electron kg
N=10 #number of cells
a=1e-10 #meters
b=2e-10 #meters
#convert energy and potential to Joules
conv_J=1.602e-19
E_eV=Ein
V_eV=15
E=conv_J*E_eV
V=conv_J*V_eV
#calculate values for k and q
k=(2*m*E/hbar**2)**.5
q=(2*m*(E-V)/hbar**2)**.5
#create M1, M2 Matrices
M1=np.matrix([[((q+k)/(2*q))*np.exp(1j*k*b),((q-k)/(2*q))*np.exp(-1j*k*b)], \
[((q-k)/(2*q))*np.exp(1j*k*b),((q+k)/(2*q))*np.exp(-1j*k*b)]])
M2=np.matrix([[((q+k)/(2*k))*np.exp(1j*q*a),((k-q)/(2*k))*np.exp(-1j*q*a)], \
[((k-q)/(2*k))*np.exp(1j*q*a),((q+k)/(2*k))*np.exp(-1j*q*a)]])
#calculate M_Cell
M_Cell=M1*M2
#calculate M for N cells
M=M_Cell**N
#get items in M_Cell
M11=M.item(0,0)
M12=M.item(0,1)
M21=M.item(1,0)
M22=M.item(1,1)
#calculate r and t values
r=-M21/M22
t=M11-M12*M21/M22
#calculate final T value
T=abs(t)**2
return Ein,T
#create empty array for data to plot
data=[]
#Calculate T for 500 values of E in between 0 and 30 eV
for i in np.linspace(0,30,2001):
data.append(getT(i))
data=np.transpose(data)
#generate plot
fig, (ax1)=plt.subplots(1)
ax1.set_xlim([0,30])
ax1.set_xlabel('Energy (eV)',fontsize=32)
ax1.set_ylabel('T',fontsize=32)
ax1.grid()
plt.tick_params(labelsize=32)
plt.plot(data[0],data[1],lw=6)
plt.draw()
plt.show()
I think the difference comes from the line
q=(2*m*(E-V)/hbar**2)**.5
When testing with single values between 0 and 15, you're basically taking the root of a negative number (because E-V is negative), which is irrational, for example:
(-2)**0.5
>> (8.659560562354934e-17+1.4142135623730951j)
But when using np.linspace, you take the root of a NumPy array with negative values, which results in nan (and a warning):
np.array(-2)**0.5
>> RuntimeWarning: invalid value encountered in power
>> nan
I have daily returns from three markets (GLD, SPY, and USO). My goal is to calculate the the average pairwise correlation from a correlation matrix on a rolling basis of 130 days.
My starting point was:
import numpy as np
import pandas as pd
import os as os
import pandas.io.data as web
import datetime as datetime
from pandas.io.data import DataReader
stocks = ['spy', 'gld', 'uso']
start = datetime.datetime(2010,1,1)
end = datetime.datetime(2016,1,1)
df = web.DataReader(stocks, 'yahoo', start, end)
adj_close_df = df['Adj Close']
returns = adj_close_df.pct_change(1).dropna()
returns = returns.dropna()
rollingcor = returns.rolling(130).corr()
This creates a panel of correlation matrices. However, extracting the lower(or upper) triangles, removing the diagonals and then calculating the average for each observation is where I've drawn a blank. Ideally I would like the output for each date to be in a Series where I can then index it by the dates.
Maybe I've started from the wrong place but any help would be appreciated.
To get the average pairwise correlation, you can find the sum of the correlation matrix, substract n (ones on the diagonal), divide by 2 (symmetry), and finally divide by n (average). I think this should do it:
>>> n = len(stocks)
>>> ((rollingcor.sum(skipna=0).sum(skipna=0) - n) / 2) / n
Date
2010-01-05 NaN
2010-01-06 NaN
2010-01-07 NaN
...
2015-12-29 0.164356
2015-12-30 0.168102
2015-12-31 0.166462
dtype: float64
You could use numpy's tril to access the lower triangle of the dataframe.
def tril_sum(df):
# -1 ensures we skip the diagonal
return np.tril(df.unstack().values, -1).sum()
Calculates the sum of the lower triangle of the matrix. Notice the unstack() in the middle of that. I'm expecting to have a multiindex series that I'll need to pivot to a dataframe.
Then apply it to your panel
n = len(stock)
avg_cor = rollingcor.dropna().to_frame().apply(tril_sum) / ((n ** 2 - n) / 2)
Looks like:
print avg_cor.head()
Date
2010-07-12 0.398973
2010-07-13 0.403664
2010-07-14 0.402483
2010-07-15 0.403252
2010-07-16 0.407769
dtype: float64
This answer skips the diagonals.
I want to compute the aggregated average of a signal over time, in a certain period. I don't know how this is called scientifically.
Example: I have an electricity consumption for a full year in 15 minute values. I want to know my average consumption by hour of the day (24 values). But it is more complex: there are more measurements in between the 15-minute steps, and I cannot foresee where they are. However, they should be taken into account, with a correct 'weight'.
I wrote a function that works, but it is extremely slow. Here is a test setup:
import numpy as np
signal = np.arange(6)
time = np.array([0, 2, 3.5, 4, 6, 8])
period = 4
interval = 2
def aggregate(signal, time, period, interval):
pass
aggregated = aggregate(signal, time, period, interval)
# This should be the result: aggregated = array([ 2. , 3.125])
aggregated should have period/interval values. This is the manual computation:
aggregated[0] = (np.trapz(y=np.array([0, 1]), x=np.array([0, 2]))/interval + \
np.trapz(y=np.array([3, 4]), x=np.array([4, 6]))/interval) / (period/interval)
aggregated[1] = (np.trapz(y=np.array([1, 2, 3]), x=np.array([2, 3.5, 4]))/interval + \
np.trapz(y=np.array([4, 5]), x=np.array([6, 8]))/interval) / (period/interval)
One last detail: it has to be efficient, thats why my own solution is not useful. Maybe I'm overlooking a numpy or scipy method? Or is this something pandas can do?
Thanks a lot for your help.
I would strongly recommend using Pandas. Here I'm using version 0.8 (soon to be released). I think this is close to what you want.
import pandas as p
import numpy as np
import matplotlib as plt
# Make up some data:
time = p.date_range(start='2011-05-23', end='2012-05-23', freq='min')
watts = np.linspace(0, 3.14 * 365, time.size)
watts = 38 * (1.5 + np.sin(watts)) + 8 * np.sin(5 * watts)
# Create a time series
ts = p.Series(watts, index=time, name='watts')
# Resample down to 15 minute pieces, using mean values
ts15 = ts.resample('15min', how='mean')
ts15.plot()
Pandas can easily do many other things with your data (like determine your average weekly energy profile). Check out p.read_csv() for reading in your data.
I think this is pretty close to what you need. I'm not sure I interpreted interval and period correctly, but I think I got it write within some constant factor.
import numpy as np
def aggregate(signal, time, period, interval):
assert (period % interval) == 0
ipp = period / interval
midpoint = np.r_[time[0], (time[1:] + time[:-1])/2., time[-1]]
cumsig = np.r_[0, (np.diff(midpoint) * signal).cumsum()]
grid = np.linspace(0, time[-1], np.floor(time[-1]/period)*ipp + 1)
cumsig = np.interp(grid, midpoint, cumsig)
return np.diff(cumsig).reshape(-1, ipp).sum(0) / period
I worked out a function that does exactly what I wanted based on the previous answers and on pandas.
def aggregate_by_time(signal, time, period=86400, interval=900, label='left'):
"""
Function to calculate the aggregated average of a timeseries by
period (typical a day) in bins of interval seconds (default = 900s).
label = 'left' or 'right'. 'Left' means that the label i contains data from
i till i+1, 'right' means that label i contains data from i-1 till i.
Returns an array with period/interval values, one for each interval
of the period.
Note: the period has to be a multiple of the interval
"""
def make_datetimeindex(array_in_seconds, year):
"""
Create a pandas DateIndex from a time vector in seconds and the year.
"""
start = pandas.datetime(year, 1, 1)
datetimes = [start + pandas.datetools.timedelta(t/86400.) for t in array_in_seconds]
return pandas.DatetimeIndex(datetimes)
interval_string = str(interval) + 'S'
dr = make_datetimeindex(time, 2012)
df = pandas.DataFrame(data=signal, index=dr, columns=['signal'])
df15min = df.resample(interval_string, closed=label, label=label)
# now create bins for the groupby() method
time_s = df15min.index.asi8/1e9
time_s -= time_s[0]
df15min['bins'] = np.mod(time_s, period)
df_aggr = df15min.groupby(['bins']).mean()
# if you only need the numpy array: take df_aggr.values
return df_aggr