Converting linreg function from pinescript to Python? - python

I am trying to convert a TradingView indicator into Python (also using pandas to store its result).
This is the indicator public code I want to convert into a python indicator:
https://www.tradingview.com/script/sU9molfV/
And I am stuck creating that pine script linereg default function.
This is the fragment of the pinescript indicator I have troubles with:
lrc = linreg(src, length, 0)
lrc1 = linreg(src,length,1)
lrs = (lrc-lrc1)
TSF = linreg(src, length, 0)+lrs
This is its documentation:
Linear regression curve. A line that best fits the prices specified
over a user-defined time period. It is calculated using the least
squares method. The result of this function is calculated using the
formula: linreg = intercept + slope * (length - 1 - offset), where
length is the y argument, offset is the z argument, intercept and
slope are the values calculated with the least squares method on
source series (x argument). linreg(source, length, offset) →
series[float]
Source:
https://www.tradingview.com/pine-script-reference/#fun_linreg
I have found this mql4 code and tried to follow it step by step in order to convert it and finally to create a function linreg in Python in order to use it further for building that pine script indicator:
https://www.mql5.com/en/code/8016
And this is my code so far:
# calculate linear regression:
# https://www.mql5.com/en/code/8016
barsToCount = 14
# sumy+=Close[i];
df['sumy'] = df['Close'].rolling(window=barsToCount).mean()
# sumxy+=Close[i]*i;
tmp = []
sumxy_lst = []
for window in df['Close'].rolling(window=barsToCount):
for index in range(len(window)):
tmp.append(window[index] * index)
sumxy_lst.append(sum(tmp))
del tmp[:]
df.loc[:,'sumxy'] = sumxy_lst
# sumx+=i;
sumx = 0
for i in range(barsToCount):
sumx += i
# sumx2+=i*i;
sumx2 = 0
for i in range(barsToCount):
sumx2 += i * i
# c=sumx2*barsToCount-sumx*sumx;
c = sumx2*barsToCount - sumx*sumx
# Line equation:
# b=(sumxy*barsToCount-sumx*sumy)/c;
df['b'] = ((df['sumxy']*barsToCount)-(sumx*df['sumy']))/c
# a=(sumy-sumx*b)/barsToCount;
df['a'] = (df['sumy']-sumx*df['b'])/barsToCount
# Linear regression line in buffer:
df['LR_line'] = 0.0
for x in range(barsToCount):
# LR_line[x]=a+b*x;
df['LR_line'].iloc[x] = df['a'].iloc[x] + df['b'].iloc[x] * x
# print(x, df['a'].iloc[x], df['b'].iloc[x], df['b'].iloc[x]*x)
print(df.tail(50))
print(list(df))
It doesn't work.
Any idea how to create a similar pine script linereg function into python, please?
Thank you in advance!

I used talib to calculate the slope and intercept on the closing prices, then realised talib offers the full calc also. The result looks to be same as TradingView (just eyeballing).
Did the following in jupyterlab:
import pandas as pd
import numpy as np
import talib as tl
from pandas_datareader import data
%run "../../plt_setup.py"
asset = data.DataReader('^AXJO', 'yahoo', start='1/1/2015')
n = 270
(asset
.assign(linreg = tl.LINEARREG(asset.Close, n))
[['Close', 'linreg']]
.dropna()
.loc['2019-01-01':]
.plot()
);

Related

Step by step time integrators in Python

I am solving a first order initial value problem of the form:
dy/dt = f(t,y(t)), y(0)=y0
I would like to obtain y(n+1) from a given numerical scheme, like for example :
using explicit Euler's scheme, we have
y(i) = y(i-1) + f(t-1,y(t-1)) * dt
Example code:
# Test code to evaluate different time integrators for the following equation:
# y' = (1/2) y + 2sin(3t) ; y(0) = -24/37
def dy_dt(y,t):
func = (1/2)*y + 2*np.sin(3*t)
return func
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import odeint
tmin = 0
tmax = 50
delt= 1e-2
t = np.arange(tmin,tmax,delt)
total_steps = len(t)
y_explicit=np.zeros(total_steps)
#y_ODEint=np.zeros(total_steps)
y0 = -24/37
y_explicit[0]=y0
#y_ODEint[0]=y0
# exact solution
y_exact = -(24/37)*np.cos(3*t)- (4/37)*np.sin(3*t) + (y0+24/37)*np.exp(0.5*t)
# Solution using ODEint Python
y_ODEint = odeint(dy_dt,y0,t)
for i in range(1,total_steps):
# Explicit scheme
y_explicit[i] = y_explicit[i-1] + (dy_dt(y_explicit[i-1],t[i-1]))*delt
# Update using ODEint
# y_ODEint[i] = odeint(dy_dt,y_ODEint[i-1],[0,delt])[-1]
plt.figure()
plt.plot(t,y_exact)
plt.plot(t,y_explicit)
# plt.plot(t,y_ODEint)
The current issue I am having is that the functions like ODEint in python provide the entire y(t) as opposed to y(i). like in the line "y_ODEint = odeint(dy_dt,y0,t)"
See in the code, how I have coded the explicit scheme, which gives y(i) for every time step. I want to do the same with ODEint, i tried something but didn't work (all commented lines)
I want to obtain y(i) rather than all ys using ODEint. Is that possible ?
Your system is time variant so you cannot translate the time step from (t[i-1], t[i]) to (0, delt).
The step by step integration will is unstable for your differential equation though
Here is what I get
def dy_dt(y,t):
func = (1/2)*y + 2*np.sin(3*t)
return func
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import odeint
tmin = 0
tmax = 40
delt= 1e-2
t = np.arange(tmin,tmax,delt)
total_steps = len(t)
y_explicit=np.zeros(total_steps)
#y_ODEint=np.zeros(total_steps)
y0 = -24/37
y_explicit[0]=y0
# exact solution
y_exact = -(24/37)*np.cos(3*t)- (4/37)*np.sin(3*t) + (y0+24/37)*np.exp(0.5*t)
# Solution using ODEint Python
y_ODEint = odeint(dy_dt,y0,t)
# To be filled step by step
y_ODEint_2 = np.zeros_like(y_ODEint)
y_ODEint_2[0] = y0
for i in range(1,len(y_ODEint_2)):
# update your code to run with the correct time interval
y_ODEint_2[i] = odeint(dy_dt,y_ODEint_2[i-1],[tmin+(i-1)*delt,tmin+i*delt])[-1]
plt.figure()
plt.plot(t,y_ODEint, label='single run')
plt.plot(t,y_ODEint_2, label='step-by-step')
plt.plot(t, y_exact, label='exact')
plt.legend()
plt.ylim([-20, 20])
plt.grid()
Important to notice that both methods are unstable, but the step-by-step explodes slightly before than the single odeint call.
With, for example dy_dt(y,t): -(1/2)*y + 2*np.sin(3*t) the integration becomes more stable, for instance, there is no noticeable error after integrating from zero to 200.

Combine two functions with conditional (if) switch in Gekko

I have two thermodynamic relationships for low (300-1000K) and high (1000-3000K) temperatures. If I want to use both of these in Gekko, how can I combine them into a single correlation that I can use in an optimization problem?
Here is a section of Python code that calculates either the low or high temperature relationship from 300K to 3000K.
import numpy as np
import matplotlib.pyplot as plt
T = np.linspace(300.0,3000.0,50)
a_lo = np.array([ 5.15,-1.37E-02,4.92E-05,-4.85E-08,1.67E-11])
a_hi = np.array([7.49E-02,1.34E-02,-5.73E-06,1.22E-09,-1.02E-13])
i_lo = np.where(np.logical_and(T>=300.0, T<1000.0))
i_hi = np.where(np.logical_and(T>=1000.0, T<=3000.0))
cp = np.zeros(50)
Rg = 8.314 # J/mol-K
cp[i_lo] = a_lo[0] + a_lo[1]*T[i_lo] + a_lo[2]*T[i_lo]**2.0 + \
a_lo[3]*T[i_lo]**3.0 + a_lo[4]*T[i_lo]**4.0
cp[i_hi] = a_hi[0] + a_hi[1]*T[i_hi] + a_hi[2]*T[i_hi]**2.0 + \
a_hi[3]*T[i_hi]**3.0 + a_hi[4]*T[i_hi]**4.0
cp *= Rg
plt.plot(T,cp,'k-',lw=5)
plt.plot(T[i_lo],cp[i_lo],'.',color='orange')
plt.plot(T[i_hi],cp[i_hi],'.',color='red')
plt.xlabel('Temperature (K)'); plt.grid()
plt.ylabel(r'$CH_4$ Heat Capacity $\left(\frac{J}{mol-K}\right)$')
plt.show()
I tried using a conditional (if) statement in building my model but it only uses the correlation that is selected from the initialized values. If temperature T is a variable in my model, I want it to switch to one or the other based on the temperature variable.
There are a few approaches to use a conditional function in your optimization or simulation problem. The first approach not exact but may be a suitable approximation by using a cubic spline that creates an interpolation between sampled points (see approach #1). The second approach is exact but requires either an Mathematical Program with Complementarity Constraints (MPCC) with if2() or an Integer Switch variable with if3() (see approach #2). These two approaches are discussed in the Design Optimization Course page on Logical Conditions in Optimization.
import numpy as np
import matplotlib.pyplot as plt
from gekko import GEKKO
# CH4 Heat capacity parameters (LO: 300-1000K, HI: 1000K-3000K)
a_lo = np.array([ 5.15,-1.37E-02,4.92E-05,-4.85E-08,1.67E-11])
a_hi = np.array([7.49E-02,1.34E-02,-5.73E-06,1.22E-09,-1.02E-13])
Rg = 8.314 # J/mol-K
m = GEKKO()
# Approach #1: Cubic Spline
def cp1(T):
if T>=300 and T<=1000:
a = a_lo
elif T>1000 and T<=3000:
a = a_hi
else:
raise Exception('Temperature ' + str(T) + ' out of range')
cp = (a[0]+a[1]*T+a[2]*T**2.0+a[3]*T**3.0+a[4]*T**4.0)*Rg
return cp
# Calculate cp at 50 pts
T = np.linspace(300.0,3000.0,50)
cp = [cp1(Ti) for Ti in T]
x1 = m.Var(lb=300,ub=3000); y1 = m.Var()
m.cspline(x1,y1,T,cp)
# Approach #2: Gekko conditional statements
def cp2(a,T):
return (a[0]+a[1]*T+a[2]*T**2.0+a[3]*T**3.0+a[4]*T**4.0)*Rg
x2 = m.Var(lb=300,ub=3000)
y2a = m.Intermediate(cp2(a_lo,x2));
y2b = m.Intermediate(cp2(a_hi,x2));
y2 = m.if3(x2-1000,y2a,y2b)
m.Equation(y1==80)
m.Equation(y2==80)
m.solve()
print('Find Temperature where cp=80 J/mol-K')
print(x1.value[0],x2.value[0])

EWMA Covariance Matrix in Pandas - Optimization

I would like to calculate the EWMA Covariance Matrix from a DataFrame of stock price returns using Pandas and have followed the methodology in PyPortfolioOpt.
I like the flexibility of using Pandas objects and functions but when the set of assets grows the function is becomes very slow:
import pandas as pd
import numpy as np
def ewma_cov_pairwise_pd(x, y, alpha=0.06):
x = x.mask(y.isnull(), np.nan)
y = y.mask(x.isnull(), np.nan)
covariation = ((x - x.mean()) * (y - y.mean()).dropna()
return covariation.ewm(alpha=0.06).mean().iloc[-1]
def ewma_cov_pd(rets, alpha=0.06):
assets = rets.columns
n = len(assets)
cov = np.zeros((n, n))
for i in range(n):
for j in range(i, n):
cov[i, j] = cov[j, i] = ewma_cov_pairwise_pd(
rets.iloc[:, i], rets.iloc[:, j], alpha=alpha)
return pd.DataFrame(cov, columns=assets, index=assets)
I would like to improve the speed of the code ideally while still using Pandas but the bottleneck is within the DataFrame.ewm() function which uses 90% of the calculation time.
If using this function was a binding constraint, what is the most efficient way of improving the speed at which the code runs? I was considering taking a brute force approach and using concurrent.futures.ProcessPoolExecutor but perhaps there is a better solutions.
n = 100 # n is typically 2000
rets = pd.DataFrame(np.random.normal(0, 1., size=(n, n)))
cov_pd = ewma_cov_pd(rets)
The true time-series data can contain leading nulls and potentially missing values after that although the latter less likely.
Update I
A potential solution which leverages off the answer provided by Quang Hoang and produces the expected results in a far more reasonable time would be something similar to:
def ewma_cov_frame_qh(rets, alpha=0.06):
weights = (1-alpha) ** np.arange(len(df))[::-1]
normalized = (rets-rets.mean()).to_numpy()
out = (weights * normalized.T) # normalized / weights.sum()
return pd.DataFrame(out, index=rets.columns, columns=rets.columns)
def ewma_cov_qh(rets, alpha=0.06):
syms = rets.columns
covar = pd.DataFrame(index=rets.columns, columns=rets.columns)
delta = rets.isnull().sum(axis=1).shift(1) - rets.isnull().sum(axis=1)
dates = delta.loc[delta != 0].index.tolist()
for date in dates:
frame = rets.loc[rets.index >= date].dropna(axis=1, how='any')
cov = ewma_cov_frame_qh(frame).reindex(index=syms, columns=syms)
covar = covar.fillna(cov)
return covar
cov_qh = ewma_cov_qh(rets)
This violates the requirement that the underlying covariance is calculated using the native Pandas/Numpy functions and calculation time will depend on the number leading na's in the data set.
Update II
A potential improvement on the above which uses (a naive implementation of) multiprocessing and improves the calculation time by a further 42.5% on my machine is listed below:
from concurrent.futures import ProcessPoolExecutor, as_completed
from functools import partial
def ewma_cov_mp_worker(date, rets, alpha=0.06):
syms = rets.columns
frame = rets.loc[rets.index >= date].dropna(axis=1, how='any')
return ewma_cov_frame_qh(frame, alpha=alpha).reindex(index=syms, columns=syms)
def ewma_cov_mp(rets, alpha=0.06):
covar = pd.DataFrame(index=rets.columns, columns=rets.columns)
delta = rets.isnull().sum(axis=1).shift(1) - rets.isnull().sum(axis=1)
dates = delta.loc[delta != 0].index.tolist()
func = partial(ewma_cov_mp_worker, rets=rets, alpha=alpha)
covs = {}
with ProcessPoolExecutor(max_workers=6) as exec:
future_to_date = {exec.submit(func, date): date for date in dates}
covs = {future_to_date[future]: future.result() for future in as_completed(future_to_date)}
for date in dates:
covar.fillna(covs[date], inplace=True)
return covar
[I have not added as answer as not addressed the original question and I am optimistic there is a better solution.]
since you don't really care for ewm, i.e, you only take the last value. We can try matrix multiplication:
def ewma(df, alpha=0.94):
weights = (1-alpha) ** np.arange(len(df))[::-1]
# fillna with 0 here
normalized = (df-df.mean()).fillna(0).to_numpy()
out = ((weights * normalized.T) # normalized / weights.sum()
return out
# verify
out = ewma(df)
print(out[0,1] == ewma_cov_pairwise(df[0],df[1]) )
# True
And this took about 150 ms on my system with df.shape==(2000,2000) while your code refuses to run within minutes :-).

How to use scipy.minimize with multiple parameters in error function?

I have two sets of frequencies data from experiment and from theoretical formula. I want to use minimize function of scipy.
Here's my code snippet.
where g is coupling which I want to find out.
Ad ind is inductance for plotting on x-axis.
from scipy.optimize import minimize
def eigenfreq1_func(ind,w_q,w_r,g):
return (w_q+w_r)+np.sqrt((w_q+w_r)**2.0-4*(w_q+w_r-g**2.0))/2
def eigenfreq2_func(ind,w_q,w_r,g):
return (w_q+w_r)-np.sqrt((w_q+w_r)**2.0-4*(w_q+w_r-g**2))/2.0
def err_func(y1,y1_fit,y2,y2_fit):
return np.sqrt((y1-y1_fit)**2+(y2-y2_fit)**2)
g_init=80e6
res1=eigenfreq1_func(ind,qubit_freq,readout_freq,g_init)
print res1
res2=eigenfreq2_func(ind,qubit_freq,readout_freq,g_init)
print res2
fit=minimize(err_func,args=[qubit_freq,res1,readout_freq,res2])
But it's showing the following error :
"TypeError: minimize() takes at least 2 arguments (2 given)"
First, the indentation in your example is messed up. Hope you don't try and run this
Second, here is a baby example to minimize the chi2 with the function scipy.optimize.minimize (note you can minimize what you want: likelihood, |chi|**?, toto, etc.):
import numpy as np
import scipy.optimize as opt
def functionyouwanttofit(x,y,z,t,u):
return np.array([x+y+z+t+u , x+y+z+t-u , x+y+z-t-u , x+y-z-t-u ]) # baby test here but put what you want
def calc_chi2(parameters):
x,y,z,t,u = parameters
data = np.array([100,250,300,500])
chi2 = sum( (data-functiontofit(x,y,z,t,u))**2 )
return chi2
# baby example for init, min & max values
x_init = 0
x_min = -1
x_max = 10
y_init = 1
y_min = -2
y_max = 9
z_init = 2
z_min = 0
z_max = 1000
t_init = 10
t_min = 1
t_max = 100
u_init = 10
u_min = 1
u_max = 100
parameters = [x_init,y_init,z_init,t_init,u_init]
bounds = [[x_min,x_max],[y_min,y_max],[z_min,z_max],[t_min,t_max],[u_min,u_max]]
result = opt.minimize(calc_chi2,parameters,bounds=bounds)
In your example you don't give initial values... This with the indentation... Were you waiting for someone doing the job for you ?
Third, note the optimization processes proposed by scipy are not always adapted to your needs. You may prefer minimizers such as lmfit

Numerical Fourier Transform of rectangular function

The aim of this post is to properly understand Numerical Fourier Transform on Python or Matlab with an example in which the Analytical Fourier Transform is well known. For this purpose I choose the rectangular function, the analytical expression of it and its Fourier Transform are reported here
https://en.wikipedia.org/wiki/Rectangular_function
Here the code in Matlab
x = -3 : 0.01 : 3;
y = zeros(length(x));
y(200:400) = 1;
ffty = fft(y);
ffty = fftshift(ffty);
plot(real(ffty))
And here the code in Python
import numpy as np
import matplotlib.pyplot as plt
x = np.arange(-3, 3, 0.01)
y = np.zeros(len(x))
y[200:400] = 1
ffty = np.fft.fft(y)
ffty = np.fft.fftshift(ffty)
plt.plot(np.real(ffty))
In both the two programming langueages I have the some result with the some problems:
First of all the fourier transfrom is not real as expected, moreover even choosing the real part, the solution does not looks like the analytical solution: in fact the first plot reported here is as it should be at least in shape and the second plot is what I get from my calculations.
Is there anyone who could suggest me how to analytically calculate the fourier transform of the rectangular function?
There are two problems in your Matlab code:
First, y = zeros(length(x)); should be y = zeros(1,length(x));. Currently you create a square matrix, not a vector.
Second, the DFT (or FFT) will be real and symmetric if y is. Your y should be symmetric, and that means with respect to 0. So, instead of y(200:400) = 1; use y(1:100) = 1; y(end-98:end) = 1;. Recall that the DFT is like the Fourier series of a signal from which your input is just one period, and the first sample corresponds to time instant 0.
So:
x = -3 : 0.01 : 3;
y = zeros(1,length(x));
y(1:100) = 1; y(end-98:end) = 1;
ffty = fft(y);
ffty = fftshift(ffty);
plot(ffty)
gives
>> isreal(ffty)
ans =
1
The code in Python is
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(-3, 3, 0.01)
y = np.zeros(len(x))
y[200:400] = 1
yShift = np.fft.fftshift(y)
fftyShift = np.fft.fft(yShift)
ffty = np.fft.fftshift(fftyShift)
plt.plot(ffty)
plt.show()

Categories

Resources