Exercise on calculating and plotting cumulated empirical distribution

Exercise on calculating and plotting cumulated empirical distribution - python

I was trying to finish an exercise in Jonh Stachurski's book (a textbook devoted to teach economists how to use Python). One of these is about how to calculate and plot cumulated empirical distribution. They provide a class called ecdf to calculate empirical distribution function
# Filename: ecdf.py
# Author: John Stachurski
# Date: December 2008
# Corresponds to: Listing 6.3
class ECDF:
def __init__(self, observations):
self.observations = observations
def __call__(self, x):
counter = 0.0
for obs in self.observations:
if obs <= x:
counter += 1
return counter / len(self.observations)
And the excercise reads
【Exercise 6.1.12】 Add a method to the ECDF class that uses Matplotlib to plot the em-
pirical distribution over a speciﬁed interval. Replicate the four graphs in ﬁgure 6.3
(modulo randomness).
the figure is need to be replicated is
and an illusion of algorithm
The following is my initial attempt
from ecdf import ECDF
import numpy as np
import matplotlib.pyplot as plt
from srs import SRS
from math import sqrt
from random import lognormvariate
# =========================
# parameters and arguments
# =========================
alpha, sigma2, s, delta = 0.3, 0.2, 0.5, 0.1
# numbers of draws
n = 1000
# length of each markov chain
t = 20
num_simu = [4,25,100,5000]
# Define F(k, z) = s k^alpha z + (1 - delta) k
F = lambda k, z: s * (k**alpha) * z + (1 - delta) * k
lognorm = lambda: lognormvariate(0, sqrt(sigma2))
# =====================
# create empirical distribution
# =====================
# different draw numbers
k = np.linspace(0,25,500)
for n in num_simu:
for x in range(n):
# list used to store capital stock (kt) in the last periods (t=20)
kt = []
solow_srs = SRS(F=F, phi=lognorm, X=1.0)
px = solow_srs.sample_path(t)
kt.append(px[-1])
# generate the empirical distribution function
F = ECDF(kt)
prob_kt_n = [F(i) for i in k] # need to determine range
# n refers to the n-th draw
# ==================================
# use for-loop to create subplots
# ==================================
#k = np.linspace(0,25,500)
#num_rows,num_cols = 2,2
The difficulties to me are 1) How can I store list/array of empirical distribution results for different draw numbers in the given graph. 2) How to create subplots using a for-loop. I also encountered some other tiny errors.
Thank you for your suggestions.

About (1), my advice is to create a dictionary (i.e. something like d = {} and then d[n] = ECDF(data) for each number n of observations).
Dunno about (2).

Related

Simulating an elementary stochastic process in Python

I'm trying to simulate a simple stochastic process in Python, but with no success. The process is the following:
x(t + δt) = r(t) * x(t)
where r(t) is a Bernoulli random variable that can assume the values 1.5 or 0.6.
I've tried the following:
n = 10
r = np.zeros( (1,n))
for i in range(0, n, 1):
if r[1,i] == r[1,0]:
r[1,i] = 1
else:
B = bernoulli.rvs(0.5, size=1)
if B == 0:
r[1,i] = r[1,i-1] * 0.6
else:
r[1,i] = r[1,i-1] * 1.5
Can you explain why this is wrong and a possible solution?

So , first thing is that the SDE should be perceived over time, so you also need to consider the discretization rather than just giving the number of steps through n .
Essentially, what you are asking is just a simple random walk with a Bernoulli random variable taking on the values 0.5 and 1.6 instead of a Gaussian (standard normal) random variable.
So I have created an answer here, using NumPy to create the Bernoulli random variable for efficiency (numpy is faster than scipy) and then running the simulation with a stepsize of 0.01 then plotting the solution using matplotlib.
One thing to note that this SDE is one dimensional so we can just store the state and time in separate vectors and plot them at the end.
# Function generating bernoulli trial (your r(t))
def get_bernoulli(p=0.5):
'''
Function using numpy (faster than scipy.stats)
to generate bernoulli random variable with values 0.5 or 1.6
'''
B = np.random.binomial(1, p, 1)
if B == 0:
return 0.6
else:
return 1.5
This is then used in the simulation as
import numpy as np
import matplotlib.pyplot as plt
dt = 0.01 #step size
x0 = 1# initialize
tfinal = 1
sqrtdt = np.sqrt(dt)
n = int(tfinal/dt)
# State and time vectors
xtraj = np.zeros(n+1, float)
trange = np.linspace(start=0,stop=tfinal ,num=n+1)
# initialized
xtraj[0] = x0
for i in range(n):
xtraj[i+1] = xtraj[i] * get_bernoulli(p=0.5)
plt.plot(trange,xtraj,label=r'$x(t)$')
plt.xlabel("time")
plt.ylabel(r"$X$")
plt.legend()
plt.show()
Where we assumed the Bernoulli trial is fair, but can be customized to add some more variation.

Make a chi-2 test and 3d spectral test with python LCG

I'm developping for my math stuff but I'm blocked. Here is the statement :
We denote by U and V two sequences of random floats, obtained using a random type generator in the interval [0,1].
If we combine the U and V sequences according to the transformation: Z = m + σ √(−2lnU) sin (2πV). We obtain a variable Z according to the law N (m, σ).
I want to obtain a normal distribution with mean m = 3 and standard deviation 1.
So, I've generated my array of 30 000 Z floats with :
def generateFloat(tr):
list = []
for i in range(tr):
list.append(random.uniform(0,1))
return list
def generateZ(nb,m,o):
U = generateFloat(nb)
V = generateFloat(nb)
tab = []
for i in range(nb):
Z = m+(o*sqrt(-2*log(U[i])) * sin (2*pi*V[i]) )
tab.append(Z)
return tab
I would like to make the chi-square test and do the 3D spectral test. Actually my chi-square test give me weird values...
def khi2(x):
S = 0
E = len(x)/(max(x))
print(max(x)-1)
for i in range(max(x)):
O = x[i]
S += ((O-E)**2)/E
return S
and for my 3d spectral test i have this code :
from mpl_toolkits import mplot3d
from matplotlib import pyplot
import numpy as np
fig = pyplot.figure()
ax = pyplot.axes()
num_bars = 30000
x_pos = genererFlottant(num_bars)
y_pos = genererFlottant(num_bars)
ax.scatter(x_pos, y_pos)
pyplot.show()
I think but i'm not sure about values that i obtened are good on my graph... And this isn't in 3d. So, i don't think it's a good graph
I don't know if i'm doing right. If someone can help me with something, i take any help that i can have :/
thank you.

efficient sampling from beta-binomial distribution in python

for a stochastic simulation I need to draw a lot of random numbers which are beta binomial distributed.
At the moment I implemented it this way (using python):
import scipy as scp
from scipy.stats import rv_discrete
class beta_binomial(rv_discrete):
"""
creating betabinomial distribution by defining its pmf
"""
def _pmf(self, k, a, b, n):
return scp.special.binom(n,k)*scp.special.beta(k+a,n-k+b)/scp.special.beta(a,b)
so sampling a random number x can be done by:
betabinomial = beta_binomial(name="betabinomial")
x = betabinomial.rvs(0.5,0.5,3) # with some parameter
The problem is, that sampling one random number takes ca. 0.5ms, which is in my case dominating the whole simulation speed. The limiting element is the evaluation of the beta functions (or gamma functions within these).
Does anyone has a great idea how to speed up the sampling?

Well, here is working and lightly tested code which seems to be faster, using compound distribution property of Beta-Binomial.
We sample p from beta and then using it as parameter for binomial. If you would sample large sized vectors, it would be even faster.
import numpy as np
def sample_Beta_Binomial(a, b, n, size=None):
p = np.random.beta(a, b, size=size)
r = np.random.binomial(n, p)
return r
np.random.seed(777777)
q = sample_Beta_Binomial(0.5, 0.5, 3, size=10)
print(q)
Output is
[3 1 3 2 0 0 0 3 0 3]
Quick test
np.random.seed(777777)
n = 10
a = 2.
b = 2.
N = 100000
q = sample_Beta_Binomial(a, b, n, size=N)
h = np.zeros(n+1, dtype=np.float64) # histogram
for v in q: # fill it
h[v] += 1.0
h /= np.float64(N) # normalization
print(h)
prints histogram
[0.03752 0.07096 0.09314 0.1114 0.12286 0.12569 0.12254 0.1127 0.09548 0.06967 0.03804]
which is quite similar to green graph in the Wiki page on Beta-Binomial

Scipy Minimize Not Working

I'm running the minimization below:
from scipy.optimize import minimize
import numpy as np
import math
import matplotlib.pyplot as plt
### objective function ###
def Rlzd_Vol1(w1, S):
L = len(S) - 1
m = len(S[0])
# Compute log returns, size (L, m)
LR = np.array([np.diff(np.log(S[:,j])) for j in xrange(m)]).T
# Compute weighted returns
w = np.array([w1, 1.0 - w1])
R = np.array([np.sum(w*LR[i,:]) for i in xrange(L)]) # size L
# Compute Realized Vol.
vol = np.std(R) * math.sqrt(260)
return vol
# stock prices
S = np.exp(np.random.normal(size=(50,2)))
### optimization ###
obj_fun = lambda w1: Rlzd_Vol1(w1, S)
w1_0 = 0.1
res = minimize(obj_fun, w1_0)
print res
### Plot objective function ###
fig_obj = plt.figure()
ax_obj = fig_obj.add_subplot(111)
n = 100
w1 = np.linspace(0.0, 1.0, n)
y_obj = np.zeros(n)
for i in xrange(n):
y_obj[i] = obj_fun(w1[i])
ax_obj.plot(w1, y_obj)
plt.show()
The objective function shows an obvious minimum (it's quadratic):
But the minimization output tells me the minimum is at 0.1, the initial point:
I cannot figure out what's going wrong. Any thoughts?

w1 is passed in as a (single entry) vector and not as scalar from the minimize routine. Try what happens if you define w1 = np.array([0.2]) and then calculate w = np.array([w1, 1.0 - w1]). You'll see you get a 2x1 matrix instead of a 2 entry vector.
To make your objective function able to handle w1 being an array you can simply put in an explicit conversion to float w1 = float(w1) as the first line of Rlzd_Vol1. Doing so I obtain the correct minimum.
Note that you might want to use scipy.optimize.minimize_scalar instead especially if you can bracket where you minimum will be.

Python Code: Geometric Brownian Motion - what's wrong?

I'm pretty new to Python, but for a paper in University I need to apply some models, using preferably Python. I spent a couple of days with the code I attached, but I can't really help, what's wrong, it's not creating a random process which looks like standard brownian motions with drift. My parameters like mu and sigma (expected return or drift and volatility) tend to change nothing but the slope of the noise process. That's my problem, it all looks like noise. Hope my problem is specific enough, here is my coode:
import math
from matplotlib.pyplot import *
from numpy import *
from numpy.random import standard_normal
'''
geometric brownian motion with drift!
Spezifikationen:
mu=drift factor [Annahme von Risikoneutralitaet]
sigma: volatility in %
T: time span
dt: lenght of steps
S0: Stock Price in t=0
W: Brownian Motion with Drift N[0,1]
'''
T=1
mu=0.025
sigma=0.1
S0=20
dt=0.01
Steps=round(T/dt)
t=(arange(0, Steps))
x=arange(0, Steps)
W=(standard_normal(size=Steps)+mu*t)### standard brownian motion###
X=(mu-0.5*sigma**2)*dt+(sigma*sqrt(dt)*W) ###geometric brownian motion####
y=S0*math.e**(X)
plot(t,y)
show()

According to Wikipedia,
So it appears that
X=(mu-0.5*sigma**2)*t+(sigma*W) ###geometric brownian motion####
rather than
X=(mu-0.5*sigma**2)*dt+(sigma*sqrt(dt)*W)
Since T represents the time horizon, I think t should be
t = np.linspace(0, T, N)
Now, according to these Matlab examples (here and here), it appears
W = np.random.standard_normal(size = N)
W = np.cumsum(W)*np.sqrt(dt) ### standard brownian motion ###
not,
W=(standard_normal(size=Steps)+mu*t)
Please check the math, however, I could be wrong.
So, putting it all together:
import matplotlib.pyplot as plt
import numpy as np
T = 2
mu = 0.1
sigma = 0.01
S0 = 20
dt = 0.01
N = round(T/dt)
t = np.linspace(0, T, N)
W = np.random.standard_normal(size = N)
W = np.cumsum(W)*np.sqrt(dt) ### standard brownian motion ###
X = (mu-0.5*sigma**2)*t + sigma*W
S = S0*np.exp(X) ### geometric brownian motion ###
plt.plot(t, S)
plt.show()
yields

An additional implementation using the parametrization of the gaussian law though the normal fonction (instead of standard_normal), a bit shorter.
import numpy as np
T = 2
mu = 0.1
sigma = 0.01
S0 = 20
dt = 0.01
N = round(T/dt)
# reversely you can specify N and then compute dt, which is more common in financial litterature
X = np.random.normal(mu * dt, sigma* np.sqrt(dt), N)
X = np.cumsum(X)
S = S0 * np.exp(X)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Exercise on calculating and plotting cumulated empirical distribution - python

About (1), my advice is to create a dictionary (i.e. something like d = {} and then d[n] = ECDF(data) for each number n of observations). Dunno about (2).

Related

Simulating an elementary stochastic process in Python

Make a chi-2 test and 3d spectral test with python LCG

efficient sampling from beta-binomial distribution in python

Scipy Minimize Not Working

Python Code: Geometric Brownian Motion - what's wrong?

Categories

Resources