Monte Carlo in Python - python

Edited to include VBA code for comparison
Also, we know the analytical value, which is 8.021, towards which the Monte-Carlo should converge, which makes the comparison easier.
Excel VBA gives 8.067 based on averaging 5 Monte-Carlo simulations (7.989, 8.187, 8.045, 8.034, 8.075)
Python gives 7.973 based on 5 MCs (7.913, 7.915, 8.203, 7.739, 8.095) and a larger Variance!
The VBA code is not even "that good", using a rather bad way to produce samples from Standard Normal!
I am running a super simple code in Python to price European Call Option via Monte Carlo, and I am surprised at how "bad" the convergence is with 10,000 "simulated paths". Usually, when running a Monte-Carlo for this simple problem in C++ or even VBA, I get better convergence.
I show the code below (the code is taken from Textbook "Python for Finance" and I run in in Visual Studio Code under Python 3.7.7, 64-bit version): I get the following results, as an example: Run 1 = 7.913, Run 2 = 7.915, Run 3 = 8.203, Run 4 = 7.739, Run 5 = 8.095,
Results such as the above, that differ by so much, would be unacceptable. How can the convergence be improved??? (Obviously by running more paths, but as I said: for 10,000 paths, the result should already have converged much better):
#MonteCarlo valuation of European Call Option
import math
import numpy as np
#Parameter Values
S_0 = 100. # initial value
K = 105. # strike
T = 1.0 # time to maturity
r = 0.05 # short rate (constant)
sigma = 0.2 # vol
nr_simulations = 10000
#Valuation Algo:
# Notice the vectorization below, instead of a loop
z = np.random.standard_normal(nr_simulations)
# Notice that the S_T below is a VECTOR!
S_T = S_0 * np.exp((r-0.5*sigma**2)+math.sqrt(T)*sigma*z)
#Call option pay-off at maturity (Vector!)
C_T = np.maximum((S_T-K),0)
# C_0 is a scalar
C_0 = math.exp(-r*T)*np.average(C_T)
print('Value of the European Call is: ', C_0)
I also include VBA code, which produces slightly better results (in my opinion): with the VBA code below, I get 7.989, 8.187, 8.045, 8.034, 8.075.
Option Explicit
Sub monteCarlo()
' variable declaration
' stock initial & final values, option pay-off at maturity
Dim stockInitial, stockFinal, optionFinal As Double
' r = rate, sigma = volatility, strike = strike price
Dim r, sigma, strike As Double
'maturity of the option
Dim maturity As Double
' instatiate variables
stockInitial = 100#
r = 0.05
maturity = 1#
sigma = 0.2
strike = 105#
' normal is Standard Normal
Dim normal As Double
' randomNr is randomly generated nr via "rnd()" function, between 0 & 1
Dim randomNr As Double
' variable for storing the final result value
Dim result As Double
Dim i, j As Long, monteCarlo As Long
monteCarlo = 10000
For j = 1 To 5
result = 0#
For i = 1 To monteCarlo
' get random nr between 0 and 1
randomNr = Rnd()
'max(Rnd(), 0.000000001)
' standard Normal
normal = Application.WorksheetFunction.Norm_S_Inv(randomNr)
stockFinal = stockInitial * Exp((r - (0.5 * (sigma ^ 2))) + (sigma * Sqr(maturity) * normal))
optionFinal = max((stockFinal - strike), 0)
result = result + optionFinal
Next i
result = result / monteCarlo
result = result * Exp(-r * maturity)
Worksheets("sheet1").Cells(j, 1) = result
Next j
MsgBox "Done"
End Sub
Function max(ByVal number1 As Double, ByVal number2 As Double)
If number1 > number2 Then
max = number1
Else
max = number2
End If
End Function

I don't think there is anything wrong with Python or numpy internals, the convergence is definitely should be the same no matter what tool you're using. I ran a few simulations with different sample sizes and different sigma values. No surprise, it turns out the speed of convergence is heavily controlled by the sigma value, see the plot below. Note that x axis is on log-scale! After the bigger oscillations fade away there are more smaller waves before it stabilizes. The easiest to see at sigma=0.5.
I'm definitely not an expert, but I think the most obvious solution is to increase sample size, as you mentioned. It would be nice to see results and code from C++ or VBA, because I don't know how familiar you are with numpy and python functions. Maybe something is not doing what you think it's doing.
Code to generate the plot (let's not talk about efficiency, it's horrible):
import numpy as np
import matplotlib.pyplot as plt
S_0 = 100. # initial value
K = 105. # strike
T = 1.0 # time to maturity
r = 0.05 # short rate (constant)
fig = plt.figure()
ax = fig.add_subplot()
plt.xscale('log')
samplesize = np.geomspace(1000, 20000000, 64)
sigmas = np.arange(0, 0.7, 0.1)
for s in sigmas:
arr = []
for n in samplesize:
n = n.astype(int)
z = np.random.standard_normal(n)
S_T = S_0 * np.exp((r-0.5*s**2)+np.sqrt(T)*s*z)
C_T = np.maximum((S_T-K),0)
C_0 = np.exp(-r*T)*np.average(C_T)
arr.append(C_0)
ax.scatter(samplesize, arr, label=f'sigma={s:.2f}')
plt.tight_layout()
plt.xlabel('Sample size')
plt.ylabel('Value')
plt.grid()
handles, labels = ax.get_legend_handles_labels()
plt.legend(handles[::-1], labels[::-1], loc='upper left')
plt.show()
Addition:
This time you got closer results to the real value using VBA. But sometimes you don't. The effect of randomness is too big here. The truth is averaging out only 5 results from a low sample number simulation is not meaningful. For example averaging out 50 different simulations in Python (with only n=10000, even though you shouldn't do that if you're willing to get the right answer) yields to 8.025167 (± 0.039717 with 95% confidence level), which is very close to the real solution.

Related

Adding constraints to my fitting model using lmfit

I am trying to fit a complex conductivity model (the drude-smith-anderson model) using lmfit.minimize. In that fitting, I want constraints on my parameters c and c1 such that 0<c<1, -1<c1<0 and 0<1+c1-c<1. So, I am using the following code:
#reference: Juluri B.K. "Fitting Complex Metal Dielectric Functions with Differential Evolution Method". http://juluribk.com/?p=1597.
#reference: https://lmfit.github.io/lmfit-py/fitting.html
#import libraries (numdifftools needs to be installed but doesn't need to be imported)
import matplotlib.pyplot as plt
import numpy as np
import lmfit as lmf
import math as mt
#define the complex conductivity model
def model(params,w):
sigma0 = params["sigma0"].value
tau = params["tau"].value
c = params["c"].value
d = params["d"].value
c1 = params["c1"].value
druidanderson = (sigma0/(1-1j*2*mt.pi*w*tau))*(1 + c1/(1-1j*2*mt.pi*w*tau)) - sigma0*c/(1-1j*2*mt.pi*w*d*tau)
return druidanderson
#defining the complex residues (chi squared is sum of squares of residues)
def complex_residuals(params,w,exp_data):
delta = model(params,w)
residual = (abs((delta.real - exp_data.real) / exp_data.real) + abs(
(delta.imag - exp_data.imag) / exp_data.imag))
return residual
# importing data from CSV file
importpath = input("Path of CSV file: ") #Asking the location of where your data file is kept (give input in form of path\name.csv)
frequency = np.genfromtxt(rf"{importpath}",delimiter=",", usecols=(0)) #path to be changed to the file from which data is taken
conductivity = np.genfromtxt(rf"{importpath}",delimiter=",", usecols=(1)) + 1j*np.genfromtxt(rf"{importpath}",delimiter=",", usecols=(2)) #path to be changed to the file from which data is taken
frequency = frequency[np.logical_not(np.isnan(frequency))]
conductivity = conductivity[np.logical_not(np.isnan(conductivity))]
w_for_fit = frequency
eps_for_fit = conductivity
#defining the bounds and initial guesses for the fitting parameters
params = lmf.Parameters()
params.add("sigma0", value = float(input("Guess for \u03C3\u2080: ")), min =10 , max = 5000) #bounds have to be changed manually
params.add("tau", value = float(input("Guess for \u03C4: ")), min = 0.0001, max =10) #bounds have to be changed manually
params.add("c1", value = float(input("Guess for c1: ")), min = -1 , max = 0) #bounds have to be changed manually
params.add("constraint", value = float(input("Guess for constraint: ")), min = 0, max=1)
params.add("c", expr="1+c1-constraint", min = 0, max = 1) #bounds have to be changed manually
params.add("d", value = float(input("Guess for \u03C4_1/\u03C4: ")),min = 100, max = 100000) #bounds have to be changed manually
# minimizing the chi square
minimizer_results = lmf.minimize(complex_residuals, params, args=(w_for_fit, eps_for_fit), method = 'differential_evolution', strategy='best1bin',
popsize=50, tol=0.01, mutation=(0, 1), recombination=0.9, seed=None, callback=None, disp=True, polish=True, init='latinhypercube')
lmf.printfuncs.report_fit(minimizer_results, show_correl=False)
As a result for the fit, I get the following output:
sigma0: 3489.38961 (init = 1000)
tau: 1.2456e-04 (init = 0.01)
c1: -0.99816132 (init = -1)
constraint: 0.98138820 (init = 1)
c: 0.00000000 == '1+c1-constraint'
d: 7333.82306 (init = 1000)
These values don't make any sense as 1+c1-c = -0.97954952 which is not 0 and is thus invalid. How to fix this issue?
Your code is not runnable. The use of input() is sort of stunning - please do not do that. Write code that is pleasant to read and separates i/o from logic.
To make a floating point residual from a complex array, use complex_array.view(float)
Guessing any parameter value to be at or very close to its limit (here, c) is a very bad idea, likely to make the fit harder.
More to your question, you defined c as "evaluate 1+c1-constant and then apply the bounds min=0, max=1". That is literally, precisely, and exactly what your
params.add("c", expr="1+c1-constraint", min = 0, max = 1)
means: calculate c as 1+c1-constraint, and then apply the bounds [0, 1]. The code is doing exactly what you told it to do.
Unless you know what you are doing (I suspect maybe not ;)), I would strongly advise doing a fit with the default leastsq method before trying to use differential_evolution. It turns out that differential_evolution is not a very good global fitting method (shgo is generally better, though no "global" solver should be considered as very reliable). But, unless you know that you need such a method, you probably do not.
I would also strongly advise you to plot your data and some models evaluated with what you think are reasonable parameters.

Exponential fit in pandas

I have this data:
puf = pd.DataFrame({'id':[1,2,3,4,5,6,7,8],
'val':[850,1889,3289,6083,10349,17860,28180,41236]})
The data seems to follow an exponential curve. Let's see the plot:
puf.plot('id','val')
I want to fit an exponential curve ($$ y = Ae^{Bx} $$, A times e to the B*X)and add it as a column in Pandas. Firstly I tried to log the values:
puf['log_val'] = np.log(puf['val'])
And then to use Numpy to fit the equation:
puf['fit'] = np.polyfit(puf['id'],puf['log_val'],1)
But I get an error:
ValueError: Length of values (2) does not match length of index (8)
My expected result is the fitted values as a new column in Pandas. I attach an image with the column fitted values I want (in orange):
I'm stuck in this code. I'm not sure what I am doing wrong. How can I create a new column with my fitted values?
Note that you asked for an exponential model yet you have the results for log-linear model.
Check out the work below:
For log-linear, we are fitting E(log(Y))ie log(y) - (log(b[0]) +b[1]*x):
from scipy.optimize import least_squares
least_squares(lambda b: np.log(puf['val']) -(np.log(b[0]) + b[1] * puf['id']),
[1,1])['x']
array([5.99531305e+02, 5.51106793e-01])
These are the values that excel gives.
On the other hand to fit an exponential curve, the randomness is on Y and not on its logarithm, E(Y)=b[0]*exp(b[1] *x) Hence we have:
least_squares(lambda b: puf['val'] - b[0]*exp(b[1] * puf['id']), [0,1])['x']
array([1.08047304e+03, 4.58116127e-01]) # correct results for exponential fit
Depending on your model choice, the values are alittle different.
Better Model? Since you have same number of parameters, consider the one that gives you lower deviance or better out of sample prediction
Note that the ideal exponential model is E(Y) = A'B'^X which for comparison can be written as log(E(Y)) = A + XB while log-linear model will be E(log(Y) = A + XB. Note the difference in Expectation.
From the two models we have:
Notice how when we go to higher numbers the log-linear overestimates. While in the lower numbers the exponential overestimates.
Code for image:
from scipy.optimize import least_squares
log_lin = least_squares(lambda b: np.log(puf['val']) -(np.log(b[0]) + b[1] * puf['id']),
[1,1])['x']
expo = least_squares(lambda b: puf['val'] - b[0]*exp(b[1] * puf['id']), [0,1])['x']
exp_fun = lambda x: expo[0] * exp(expo[1]*x)
log_lin_fun = lambda x:log_lin[0] * exp(log_lin[1]*x)
plt.plot(puf.id, puf.val, label = 'original')
plt.plot(puf.id, exp_fun(puf.id), label='exponential')
plt.plot(puf.id, log_lin_fun(puf.id), label='log-linear')
plt.legend()
Your getting that error because np.polyfit(puf['id'],puf['log_val'],1) returns two values array([0.55110679, 6.39614819]) which isn't the shape of your dataframe.
This is what you want
y = a* exp (b*x) -> ln(y)=ln(a)+bx
f = np.polyfit(df['id'], np.log(df['val']), 1)
where
a = np.exp(f[1]) -> 599.5313046712091
b = f[0] -> 0.5511067934637022
Giving
puf['fit'] = a * np.exp(b * puf['id'])
id val fit
0 1 850 1040.290193
1 2 1889 1805.082864
2 3 3289 3132.130026
3 4 6083 5434.785677
4 5 10349 9430.290286
5 6 17860 16363.179739
6 7 28180 28392.938399
7 8 41236 49266.644002

Checking if Frequentist approach is correct? Bayesian approach using MCMC for AB test. How to calculate Bayes Factors in Python?

I've been trying to get my head around Frequentist and Bayesian approaches for a toy data AB test problem.
The results don't really make sense to me. I am struggling to understand the results, or whether I have computed them (in)correctly (which is probably likely). Furthermore, after much research, I am still somewhat lost as to how to compute Bayes Factors. I've seen packages in R that make this look somewhat easy. Alas, I am not familiar with R and would prefer to be able to solve this problem in Python.
I would greatly appreciate any help and guidance regarding this!
Here is the data:
# imports
import pingouin as pg
import pymc3 as pm
import pandas as pd
import numpy as np
import scipy.stats as scs
import statsmodels.stats.api as sms
import math
import matplotlib.pyplot as plt
# A = control -- B = treatment
a_success = 10730
a_failure = 61988
a_total = a_success + a_failure
a_cr = a_success / a_total
b_success = 10966
b_failure = 60738
b_total = b_success + b_failure
b_cr = b_success / b_total
I started by doing some power analysis, to determine the number of required samples with a power of 0.8, alpha of 0.05 and a practical significance of 2%. I'm not sure whether expected conversion rates should be supplied, or the baseline + some proportion. Depending on the effect size, the required number of samples increases significantly.
# determine required sample size
baseline_rate = a_cr
practical_significance = 0.02
alpha = 0.05
power = 0.8
nobs1 = None
# is this how to calculate effect size?
effect_size = sms.proportion_effectsize(baseline_rate, baseline_rate + practical_significance) # 5204
# # or this?
# effect_size = sms.proportion_effectsize(baseline_rate, baseline_rate + baseline_rate * practical_significance) # 228583
sample_size = sms.NormalIndPower().solve_power(effect_size = effect_size,
power = power,
alpha = alpha,
nobs1 = nobs1,
ratio = 1)
I continued trying to determine if the null hypothesis could be rejected:
# calculate pooled probability
pooled_probability = (a_success + b_success) / (a_total + b_total)
# calculate pooled standard error and margin of error
se_pooled = math.sqrt(pooled_probability * (1 - pooled_probability) * (1 / b_total + 1 / a_total))
z_score = scs.norm.ppf(1 - alpha / 2)
margin_of_error = se_pooled * z_score
# the estimated difference between probability of conversions of both groups
d_hat = (test_b_success / test_b_total) - (test_a_success / test_a_total)
# test if null hypothesis can be rejected
lower_bound = d_hat - margin_of_error
upper_bound = d_hat + margin_of_error
if practical_significance < lower_bound:
print("reject null hypothesis -- groups do not have the same conversion rates")
else:
print("do not reject the null hypothesis -- groups have the same conversion rates")
which evaluates to 'do not reject the null ...' despite group B (treatment) showing a 3.65% relative improvement with regards to conversion rate over group A (control) which seems... odd?
I tried a slightly different approach (I guess a slightly different hypothesis?):
successes = [a_success, b_success]
nobs = [a_total, b_total]
z_stat, p_value = sms.proportions_ztest(successes, nobs=nobs)
(lower_a, lower_b), (upper_a, upper_b) = sms.proportion_confint(successes, nobs=nobs, alpha=alpha)
if p_value < alpha:
print("reject null hypothesis -- groups do not have the same conversion rates")
else:
print("do not reject the null hypothesis -- groups have the same conversion rates")
Which evaluates to 'reject null hypothesis ... ' with p-value: 0.004236. This seems highly contradictory, especially since the p-value is < 0.01.
On to Bayes... I created some arrays of success and failures (and only tested on 100 observations) due to how long this thing takes, and ran the following:
# generate lists of 1, 0
obs_a = np.repeat([1, 0], [a_success, a_failure])
obs_v = np.repeat([1, 0], [b_success, b_failure])
for _ in range(10):
np.random.shuffle(observations_A)
np.random.shuffle(observations_B)
with pm.Model() as model:
p_A = pm.Beta("p_A", 1, 1)
p_B = pm.Beta("p_B", 1, 1)
delta = pm.Deterministic("delta", p_A - p_B)
obs_A = pm.Bernoulli("obs_A", p_A, observed = obs_a[:1000])
obs_B = pm.Bernoulli("obs_B", p_B, observed = obs_b[:1000])
step = pm.NUTS()
trace = pm.sample(1000, step = step, chains = 2)
Firstly, I understand that you are supposed to burn some proportion of the trace -- how do you determine an appropriate number of indices to burn?
In trying to evaluate the posterior probabilities, is the following code the correct way to do this?
b_lift = (trace['p_B'].mean() - trace['p_A'].mean()) / trace['p_A'].mean() * 100
b_prob = np.mean(trace["delta"] > 0)
a_lift = (trace['p_A'].mean() - trace['p_B'].mean()) / trace['p_B'].mean() * 100
a_prob = np.mean(trace["delta"] < 0)
# is the Bayes Factor just the ratio of the posterior probabilities for these two models?
BF = (trace['p_B'] / trace['p_A']).mean()
print(f'There is {b_prob} probability B outperforms A by a magnitude of {round(b_lift, 2)}%')
print(f'There is {a_prob} probability A outperforms B by a magnitude of {round(a_lift, 2)}%')
print('BF:', BF)
-- output:
There is 0.666 probability B outperforms A by a magnitude of 1.29%
There is 0.334 probability A outperforms B by a magnitude of -1.28%
BF: 1.013357654428127
I suspect that this is not the correct way to calculate Bayes Factors. How can the Bayes Factor be calculated?
I really hope you can help me understand all of the above... I realize it's an exceptionally long post. But I've tried every resource I can find and am still stuck!
Kind regards.

Exponential Moving Average by time interval [duplicate]

I have a range of dates and a measurement on each of those dates. I'd like to calculate an exponential moving average for each of the dates. Does anybody know how to do this?
I'm new to python. It doesn't appear that averages are built into the standard python library, which strikes me as a little odd. Maybe I'm not looking in the right place.
So, given the following code, how could I calculate the moving weighted average of IQ points for calendar dates?
from datetime import date
days = [date(2008,1,1), date(2008,1,2), date(2008,1,7)]
IQ = [110, 105, 90]
(there's probably a better way to structure the data, any advice would be appreciated)
EDIT:
It seems that mov_average_expw() function from scikits.timeseries.lib.moving_funcs submodule from SciKits (add-on toolkits that complement SciPy) better suits the wording of your question.
To calculate an exponential smoothing of your data with a smoothing factor alpha (it is (1 - alpha) in Wikipedia's terms):
>>> alpha = 0.5
>>> assert 0 < alpha <= 1.0
>>> av = sum(alpha**n.days * iq
... for n, iq in map(lambda (day, iq), today=max(days): (today-day, iq),
... sorted(zip(days, IQ), key=lambda p: p[0], reverse=True)))
95.0
The above is not pretty, so let's refactor it a bit:
from collections import namedtuple
from operator import itemgetter
def smooth(iq_data, alpha=1, today=None):
"""Perform exponential smoothing with factor `alpha`.
Time period is a day.
Each time period the value of `iq` drops `alpha` times.
The most recent data is the most valuable one.
"""
assert 0 < alpha <= 1
if alpha == 1: # no smoothing
return sum(map(itemgetter(1), iq_data))
if today is None:
today = max(map(itemgetter(0), iq_data))
return sum(alpha**((today - date).days) * iq for date, iq in iq_data)
IQData = namedtuple("IQData", "date iq")
if __name__ == "__main__":
from datetime import date
days = [date(2008,1,1), date(2008,1,2), date(2008,1,7)]
IQ = [110, 105, 90]
iqdata = list(map(IQData, days, IQ))
print("\n".join(map(str, iqdata)))
print(smooth(iqdata, alpha=0.5))
Example:
$ python26 smooth.py
IQData(date=datetime.date(2008, 1, 1), iq=110)
IQData(date=datetime.date(2008, 1, 2), iq=105)
IQData(date=datetime.date(2008, 1, 7), iq=90)
95.0
I'm always calculating EMAs with Pandas:
Here is an example how to do it:
import pandas as pd
import numpy as np
def ema(values, period):
values = np.array(values)
return pd.ewma(values, span=period)[-1]
values = [9, 5, 10, 16, 5]
period = 5
print ema(values, period)
More infos about Pandas EWMA:
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.ewma.html
I did a bit of googling and I found the following sample code (http://osdir.com/ml/python.matplotlib.general/2005-04/msg00044.html):
def ema(s, n):
"""
returns an n period exponential moving average for
the time series s
s is a list ordered from oldest (index 0) to most
recent (index -1)
n is an integer
returns a numeric array of the exponential
moving average
"""
s = array(s)
ema = []
j = 1
#get n sma first and calculate the next n period ema
sma = sum(s[:n]) / n
multiplier = 2 / float(1 + n)
ema.append(sma)
#EMA(current) = ( (Price(current) - EMA(prev) ) x Multiplier) + EMA(prev)
ema.append(( (s[n] - sma) * multiplier) + sma)
#now calculate the rest of the values
for i in s[n+1:]:
tmp = ( (i - ema[j]) * multiplier) + ema[j]
j = j + 1
ema.append(tmp)
return ema
You can also use the SciPy filter method because the EMA is an IIR filter. This will have the benefit of being approximately 64 times faster as measured on my system using timeit on large data sets when compared to the enumerate() approach.
import numpy as np
from scipy.signal import lfilter
x = np.random.normal(size=1234)
alpha = .1 # smoothing coefficient
zi = [x[0]] # seed the filter state with first value
# filter can process blocks of continuous data if <zi> is maintained
y, zi = lfilter([1.-alpha], [1., -alpha], x, zi=zi)
I don't know Python, but for the averaging part, do you mean an exponentially decaying low-pass filter of the form
y_new = y_old + (input - y_old)*alpha
where alpha = dt/tau, dt = the timestep of the filter, tau = the time constant of the filter? (the variable-timestep form of this is as follows, just clip dt/tau to not be more than 1.0)
y_new = y_old + (input - y_old)*dt/tau
If you want to filter something like a date, make sure you convert to a floating-point quantity like # of seconds since Jan 1 1970.
My python is a little bit rusty (anyone can feel free to edit this code to make corrections, if I've messed up the syntax somehow), but here goes....
def movingAverageExponential(values, alpha, epsilon = 0):
if not 0 < alpha < 1:
raise ValueError("out of range, alpha='%s'" % alpha)
if not 0 <= epsilon < alpha:
raise ValueError("out of range, epsilon='%s'" % epsilon)
result = [None] * len(values)
for i in range(len(result)):
currentWeight = 1.0
numerator = 0
denominator = 0
for value in values[i::-1]:
numerator += value * currentWeight
denominator += currentWeight
currentWeight *= alpha
if currentWeight < epsilon:
break
result[i] = numerator / denominator
return result
This function moves backward, from the end of the list to the beginning, calculating the exponential moving average for each value by working backward until the weight coefficient for an element is less than the given epsilon.
At the end of the function, it reverses the values before returning the list (so that they're in the correct order for the caller).
(SIDE NOTE: if I was using a language other than python, I'd create a full-size empty array first and then fill it backwards-order, so that I wouldn't have to reverse it at the end. But I don't think you can declare a big empty array in python. And in python lists, appending is much less expensive than prepending, which is why I built the list in reverse order. Please correct me if I'm wrong.)
The 'alpha' argument is the decay factor on each iteration. For example, if you used an alpha of 0.5, then today's moving average value would be composed of the following weighted values:
today: 1.0
yesterday: 0.5
2 days ago: 0.25
3 days ago: 0.125
...etc...
Of course, if you've got a huge array of values, the values from ten or fifteen days ago won't contribute very much to today's weighted average. The 'epsilon' argument lets you set a cutoff point, below which you will cease to care about old values (since their contribution to today's value will be insignificant).
You'd invoke the function something like this:
result = movingAverageExponential(values, 0.75, 0.0001)
In matplotlib.org examples (http://matplotlib.org/examples/pylab_examples/finance_work2.html) is provided one good example of Exponential Moving Average (EMA) function using numpy:
def moving_average(x, n, type):
x = np.asarray(x)
if type=='simple':
weights = np.ones(n)
else:
weights = np.exp(np.linspace(-1., 0., n))
weights /= weights.sum()
a = np.convolve(x, weights, mode='full')[:len(x)]
a[:n] = a[n]
return a
I found the above code snippet by #earino pretty useful - but I needed something that could continuously smooth a stream of values - so I refactored it to this:
def exponential_moving_average(period=1000):
""" Exponential moving average. Smooths the values in v over ther period. Send in values - at first it'll return a simple average, but as soon as it's gahtered 'period' values, it'll start to use the Exponential Moving Averge to smooth the values.
period: int - how many values to smooth over (default=100). """
multiplier = 2 / float(1 + period)
cum_temp = yield None # We are being primed
# Start by just returning the simple average until we have enough data.
for i in xrange(1, period + 1):
cum_temp += yield cum_temp / float(i)
# Grab the timple avergae
ema = cum_temp / period
# and start calculating the exponentially smoothed average
while True:
ema = (((yield ema) - ema) * multiplier) + ema
and I use it like this:
def temp_monitor(pin):
""" Read from the temperature monitor - and smooth the value out. The sensor is noisy, so we use exponential smoothing. """
ema = exponential_moving_average()
next(ema) # Prime the generator
while True:
yield ema.send(val_to_temp(pin.read()))
(where pin.read() produces the next value I'd like to consume).
May be shortest:
#Specify decay in terms of span
#data_series should be a DataFrame
ema=data_series.ewm(span=5, adjust=False).mean()
import pandas_ta as ta
data["EMA3"] = ta.ema(data["close"], length=3)
pandas_ta is a Technical Analysis Library: https://github.com/twopirllc/pandas-ta. Above code calculates the Exponential Moving Average (EMA) for a series. You can specify the lag value using 'length'. Spesifically, above code calculates '3-day EMA'.
Here is a simple sample I worked up based on http://stockcharts.com/school/doku.php?id=chart_school:technical_indicators:moving_averages
Note that unlike in their spreadsheet, I don't calculate the SMA, and I don't wait to generate the EMA after 10 samples. This means my values differ slightly, but if you chart it, it follows exactly after 10 samples. During the first 10 samples, the EMA I calculate is appropriately smoothed.
def emaWeight(numSamples):
return 2 / float(numSamples + 1)
def ema(close, prevEma, numSamples):
return ((close-prevEma) * emaWeight(numSamples) ) + prevEma
samples = [
22.27, 22.19, 22.08, 22.17, 22.18, 22.13, 22.23, 22.43, 22.24, 22.29,
22.15, 22.39, 22.38, 22.61, 23.36, 24.05, 23.75, 23.83, 23.95, 23.63,
23.82, 23.87, 23.65, 23.19, 23.10, 23.33, 22.68, 23.10, 22.40, 22.17,
]
emaCap = 10
e=samples[0]
for s in range(len(samples)):
numSamples = emaCap if s > emaCap else s
e = ema(samples[s], e, numSamples)
print e
I'm a little late to the party here, but none of the solutions given were what I was looking for. Nice little challenge using recursion and the exact formula given in investopedia.
No numpy or pandas required.
prices = [{'i': 1, 'close': 24.5}, {'i': 2, 'close': 24.6}, {'i': 3, 'close': 24.8}, {'i': 4, 'close': 24.9},
{'i': 5, 'close': 25.6}, {'i': 6, 'close': 25.0}, {'i': 7, 'close': 24.7}]
def rec_calculate_ema(n):
k = 2 / (n + 1)
price = prices[n]['close']
if n == 1:
return price
res = (price * k) + (rec_calculate_ema(n - 1) * (1 - k))
return res
print(rec_calculate_ema(3))
A fast way (copy-pasted from here) is the following:
def ExpMovingAverage(values, window):
""" Numpy implementation of EMA
"""
weights = np.exp(np.linspace(-1., 0., window))
weights /= weights.sum()
a = np.convolve(values, weights, mode='full')[:len(values)]
a[:window] = a[window]
return a
I am using a list and a rate of decay as inputs. I hope this little function with just two lines may help you here, considering deep recursion is not stable in python.
def expma(aseries, ratio):
return sum([ratio*aseries[-x-1]*((1-ratio)**x) for x in range(len(aseries))])
more simply, using pandas
def EMA(tw):
for x in tw:
data["EMA{}".format(x)] = data['close'].ewm(span=x, adjust=False).mean()
EMA([10,50,100])
Papahaba's answer was almost what I was looking for (thanks!) but I needed to match initial conditions. Using an IIR filter with scipy.signal.lfilter is certainly the most efficient. Here's my redux:
Given a NumPy vector, x
import numpy as np
from scipy import signal
period = 12
b = np.array((1,), 'd')
a = np.array((period, 1-period), 'd')
zi = signal.lfilter_zi(b, a)
y, zi = signal.lfilter(b, a, x, zi=zi*x[0:1])
Get the N-point EMA (here, 12) returned in the vector y

Incorporating a function into an ODE integration

This question is probably very simple but for the life of me I can't figure it out. Basically, I have a neuron whose voltage I'm modeling, but I have it receiving input spikes from other neurons randomly. So a friend of mine helped to create a function that essentially has some excitatory neurons provide a random Poisson spike which increases the voltage randomly and some inhibitory neurons providing downward spikes lowering the voltage. I've included the code below. Basically the step I'm trying to figure out how to do is how to make the I_syn term in the iterative step work. I would normally think to just write I_syn[i-1], but that gives me an error:
'function' object has no attribute '__getitem__'.
So I'm sure this question is really simple, but it's a problem I don't know how to overcome. How do I get this program to iterate the I_syn term properly so I can do a basic iterative scheme of an ODE while including a function defined previously in the code? It's important because I'll likely have more complicated neuron equations in the near future, so it would be much better to write the functions beforehand and then call them into the iteration step as needed. Thank you!
from numpy import *
from pylab import *
## setup parameters and state variables
T = 50 # total time to simulate (msec)
dt = 0.125 # simulation time step (msec)
time = arange(0, T+dt, dt) # time array
t_rest = 0 # initial refractory time
## LIF properties
Vm = zeros(len(time)) # potential (V) trace over time
Rm = 1 # resistance (kOhm)
Cm = 10 # capacitance (uF)
tau_m = Rm*Cm # time constant (msec)
tau_ref = 4 # refractory period (msec)
Vth = 1 # spike threshold (V)
V_spike = 0.5 # spike delta (V)
## Stimulus
I = 1.5 # input current (A)
N = 1000
N_ex = 0.8*N #(0..79)
N_in = 0.2*N #(80..99)
G_ex = 0.1
K = 4
def I_syn(spks, t):
"""
Synaptic current
spks = [[synid, t],]
"""
if len(spks) == 0:
return 0
exspk = spks[spks[:,0]<N_ex] # Check for all excitatory spikes
delta_k = exspk[:,1] == t # Delta function
if np.any(delta_k) > 0:
h_k = np.random.rand(len(delta_k)) < 0.90 # probability of successful transmission
else:
h_k = 0
inspk = spks[spks[:,0] >= N_ex] #Check remaining neurons for inhibitory spikes
delta_m = inspk[:,1] == t #Delta function for inhibitory neurons
if np.any(delta_m) > 0:
h_m = np.random.rand(len(delta_m)) < 0.90
else:
h_m = 0
isyn = C_m*G_ex*(np.sum(h_k*delta_k) - K*np.sum(h_m*delta_m))
return isyn
## iterate over each time step
for i, t in enumerate(time):
if t > t_rest:
Vm[i] = Vm[i-1] + (-Vm[i-1] + I_syn*Rm) / tau_m * dt
if Vm[i] >= Vth:
Vm[i] += V_spike
t_rest = t + tau_ref
## plot membrane potential trace
plot(time, Vm)
title('Leaky Integrate-and-Fire Example')
ylabel('Membrane Potential (V)')
xlabel('Time (msec)')
ylim([0,2])
show()
I_syn is just a function so using I_syn[i-1] will throw this error:
'function' object has no attribute '__getitem__'
If what you are looking for is a return value from the function, then you should first call it and then access what you want.
# pass related arguments as well since the function expects it
I_syn(arg1, arg2)[i-1]

Categories

Resources