I tried to make simple monte carlo simulation for stock investments where you start with some investment value, investment period (in years) and mean and std of stock mutual fund. I also wanted to implement an easy way for stock market crash - I did it so that whenever new calculated value was for 40 % higher than previous one, the new value should fall for 90 % - like some kind of crash. I managed to make it working and here is the code, but I think that it is not working right. The problem is probably hidden where I call previous value. Could you try to make it working?
import matplotlib
import matplotlib.pyplot as plt
import random
import numpy as np
mean = 7.0 #mean for stock mutual fund
std = 19.0 #std for stock mutual fund
def investment_return(): #random normal distribution of returns
investment_return = (np.random.normal(mean,std))/100
return investment_return
def investor(A, B):
investment_value = A
investment_period = B
wX = []
vY = []
x = 1
while x <= investment_period:
value = A + A*investment_return()
if value > value * 1.4: #if new value is 1.4x bigger than previous
A = value * 0.1 #than make -90 percent adjustment
else:
A = value #else use new value
wX.append(x)
vY.append(value)
x += 1
#print(value)
plt.plot(wX,vY)
i = 0
while i < 10: #number of investors
investor(100,20) #starting value and investment period
i += 1
plt.ylabel('Investment_value')
plt.xlabel('Investment_period')
plt.show()
Well, I tried best I could to interpret what you were after. It helped that you provided a solid basis to work with :).
Ok so, here we go: obviously, Kevin's remark that value > value * 1.4 will never evaluate to True is a solid one. I did rename some variables (for example, normally we compare stocks as indices, so I renamed A to index). Time is generally referred to as t, not x. The while loops were a little quirky, so I got rid of those.
import matplotlib.pyplot as plt
import numpy as np
mean = 7.0
std = 19.0
def investment_return():
return (np.random.normal(mean, std)) / 100
def investor(index, period):
wT = []
vY = []
for t in range(1, period + 1):
new_index = index + index * investment_return()
if new_index > index * 1.4:
index = new_index * 0.1
else:
index = new_index
wT.append(t)
vY.append(index)
return wT, vY
for i in range(0, 10):
wT, vY = investor(100, 20)
# do something with your data
plt.plot(wT, vY)
plt.ylabel('Investment_value')
plt.xlabel('Investment_period')
plt.show()
This occasionally does have a stock crash, as can clearly be seen (do keep in mind that this requires you to sample >40 from an N(7,19) distribution: that should not happen in a little over 95% of all cases).
Related
I have a Python script in which for every new sample i have to update the standard deviation of this samples array using a rolling window of length N. Using the simple formula of the standard deviation the code is really slow. I found many different solutions for online calculation but all of them are NOT considering a rolling window for the update. Some of alternative way for computing the variance are explained here (Welford algorithm, parallel, ...)
https://en.m.wikipedia.org/wiki/Algorithms_for_calculating_variance
but none of them are actually using a rolling window through data set.
What i'm looking for is a fast algorithm which won't be prone to catastrophic cancellation phenomenon.
Formulas will be appreciated.
Thanks for your help guys.
Here's an adaptation of the code at the link I put in a comment. It takes O(1) (constant) time for each element added, regardless of window size. Note that if you configure for a window of N elements, the first N-1 results are more-than-less gibberish: it initializes the data to N zeroes. The class also saves the most recent N entries in a collections.deque of maximum size N. Note that this computes the "sample" standard deviation, not "population". Season to taste ;-)
from collections import deque
from math import sqrt
class Running:
def __init__(self, n):
self.n = n
self.data = deque([0.0] * n, maxlen=n)
self.mean = self.variance = self.sdev = 0.0
def add(self, x):
n = self.n
oldmean = self.mean
goingaway = self.data[0]
self.mean = newmean = oldmean + (x - goingaway) / n
self.data.append(x)
self.variance += (x - goingaway) * (
(x - newmean) + (goingaway - oldmean)) / (n - 1)
self.sdev = sqrt(self.variance)
Here's a by-eyeball sanity check. Seems fine. But note that the statistics module makes heroic (and slow!) efforts to maximize floating-point accuracy. The code above just accepts accumulating half a dozen fresh rounding errors per element added.
import statistics
from random import random
from math import ulp
r = Running(50)
for i in range(1000000):
r.add(random() * 100)
assert len(r.data) == 50
if i % 1000 == 0.0:
a, b = r.mean, statistics.mean(r.data)
print(i, "mean", a, b, (a - b) / ulp(b))
a, b = r.sdev, statistics.stdev(r.data)
print(i, "sdev", a, b, (a - b) / ulp(b))
Sample output (will vary across runs):
0 mean 1.4656985567210468 1.4656985567210468 0.0
0 sdev 10.364053886327875 10.364053886327877 -1.0
1000 mean 50.73313401192864 50.73313401192878 -20.0
1000 sdev 31.06576415649153 31.06576415649151 5.0
2000 mean 50.4175663202043 50.41756632020437 -10.0
2000 sdev 27.692406266774878 27.69240626677488 -1.0
3000 mean 53.054435599235525 53.0544355992356 -11.0
3000 sdev 32.439246859431556 32.439246859431606 -7.0
4000 mean 51.66216784517698 51.662167845177 -3.0
4000 sdev 31.026902004950404 31.02690200495047 -18.0
5000 mean 54.08949367166644 54.089493671666425 2.0
5000 sdev 29.405357061221196 29.40535706122128 -24.0
...
I have a function that I use for standard deviation. I modified it from a php function that calculated standard deviation. You can input an array (or a slice of an array) into it, and it will calculate the standard deviation for that array or slice.
def calc_std_dev(lst, precision=0, sample=True):
"""
:param: lst A Python list containing values
:param: precision The number of decimal places desired.
:param: sample Is the data a sample or a population? Set to True by
default.
"""
sum = 0;
length = len(lst)
if length == 0:
print("The array has zero elements.")
return false
elif (length == 1) and (sample == True):
print("The array has only 1 element.")
return false
else:
sum = math.fsum(lst)
# Calculate the arithmetic mean
mean = sum / length
carry = 0.0
for i in lst:
dev = i - mean
carry += dev * dev
if sample == True:
length = length - 1
variance = carry / length
std_dev = math.sqrt(variance)
std_dev = round(std_dev, precision)
return std_dev
When I need a rolling standard deviation, I pass in a slice of the total list to calculate the value.
Sorry, this is probably a very noob question, but I'm converting some code I've been modeling with from MATLAB to Python both to help me learn Python and to see if it could run it any faster. In MATLAB, this code takes about 1 second to run, but in Python, it takes about 1 minute. Is there some way to speed it up, or is this not a good application of Python?
import numpy as np
import matplotlib.pyplot as plt
N = 7e5 #number of time steps
dt = 1e-6 #Time step, in seconds
tf = dt*N #Final time, seconds.
trange = np.linspace(0,tf,int(N+1)) #time range
dx = L/M #spatial step size in thermoelectric, meters
#Define dimensionless fourier number in thermoelectric
Fo = dt*(k/c)/(dx**2)
#temperature profile in thermoelectric as a function of space and time
T = np.zeros((M+1,2))
#Allocate initial condition
T[:,0] = Ti
#Set boundary condition at x=L
T[M,:] = T2
#temperature v time profile of coldside of thermoelectric
coldTemp = np.zeros(len(trange))
#initial coldside temp
coldTemp[0] = Ti
#setting current to optimum DC value
I = Issmax
#iterate over timesteps
for p in range(int(N)):
#Use central difference forward time method to find temperature within
#thermoelectric material.
for n in range(M-1):
#calculate temp. change at next time step
T[n+1,1] = T[n+1,0] + Fo*(T[n+2,0]-2*T[n+1,0]+T[n,0]) + dt*((I)**2*rho/(c*d**2*w**2))
#Apply energy balance to the metal (assumed isothermal) and use the
#fact that the metal temp is equal to the thermoelectric temp
T[0,1] = T[0,0] + dt*((I)**2*rhom/(cm*lm**2*wm**2)) - (k*dt/(cm*dx*lm))*(T[0,0]-T[1,0]) - (dt*(I)*S*T[0,0]/(cm*d*w*lm))
#Saving coldside temp
coldTemp[p+1] = T[0,1]
#Setting current temperature profile to be calculated one
T[:,0] = T[:,1]
#Plotting coldside temp vs time
plt.plot(trange, coldTemp)
the suggestions in comments above are good, but before anything else, you are violating perhaps the #1 rule of making loops faster: Don't do things inside of a loop that can be done outside. You are re-computing the same values billions of times, and they are "expensive" with division and exponentiation. Consider something like this... (Check the math, I wasn't too careful, and perhaps there is more you can do)
...
# calculate the constants...
c1 = dt*((I)**2*rho/(c*d**2*w**2))
c2 = dt*((I)**2*rhom/(cm*lm**2*wm**2))
c3 = (k*dt/(cm*dx*lm))
c4 = (dt*(I)*S/(cm*d*w*lm))
#iterate over timesteps
for p in range(int(N)):
#Use central difference forward time method to find temperature within
#thermoelectric material.
for n in range(M-1):
#calculate temp. change at next time step
T[n+1,1] = T[n+1,0] + Fo*(T[n+2,0]-2*T[n+1,0]+T[n,0]) + c1
#Apply energy balance to the metal (assumed isothermal) and use the
#fact that the metal temp is equal to the thermoelectric temp
T[0,1] = T[0,0] + c2 - c3*(T[0,0]-T[1,0]) - c4*T[0,0]
#Saving coldside temp
coldTemp[p+1] = T[0,1]
#Setting current temperature profile to be calculated one
T[:,0] = T[:,1]
I'm trying to simulate a simple stochastic process in Python, but with no success. The process is the following:
x(t + δt) = r(t) * x(t)
where r(t) is a Bernoulli random variable that can assume the values 1.5 or 0.6.
I've tried the following:
n = 10
r = np.zeros( (1,n))
for i in range(0, n, 1):
if r[1,i] == r[1,0]:
r[1,i] = 1
else:
B = bernoulli.rvs(0.5, size=1)
if B == 0:
r[1,i] = r[1,i-1] * 0.6
else:
r[1,i] = r[1,i-1] * 1.5
Can you explain why this is wrong and a possible solution?
So , first thing is that the SDE should be perceived over time, so you also need to consider the discretization rather than just giving the number of steps through n .
Essentially, what you are asking is just a simple random walk with a Bernoulli random variable taking on the values 0.5 and 1.6 instead of a Gaussian (standard normal) random variable.
So I have created an answer here, using NumPy to create the Bernoulli random variable for efficiency (numpy is faster than scipy) and then running the simulation with a stepsize of 0.01 then plotting the solution using matplotlib.
One thing to note that this SDE is one dimensional so we can just store the state and time in separate vectors and plot them at the end.
# Function generating bernoulli trial (your r(t))
def get_bernoulli(p=0.5):
'''
Function using numpy (faster than scipy.stats)
to generate bernoulli random variable with values 0.5 or 1.6
'''
B = np.random.binomial(1, p, 1)
if B == 0:
return 0.6
else:
return 1.5
This is then used in the simulation as
import numpy as np
import matplotlib.pyplot as plt
dt = 0.01 #step size
x0 = 1# initialize
tfinal = 1
sqrtdt = np.sqrt(dt)
n = int(tfinal/dt)
# State and time vectors
xtraj = np.zeros(n+1, float)
trange = np.linspace(start=0,stop=tfinal ,num=n+1)
# initialized
xtraj[0] = x0
for i in range(n):
xtraj[i+1] = xtraj[i] * get_bernoulli(p=0.5)
plt.plot(trange,xtraj,label=r'$x(t)$')
plt.xlabel("time")
plt.ylabel(r"$X$")
plt.legend()
plt.show()
Where we assumed the Bernoulli trial is fair, but can be customized to add some more variation.
I am trying to sum the values in Callpayoffs, as they represent the payoffs based on the last price which is generated in the prior path asset price loop. If I run 10 simulations, there should be 10 Callpayoffs based on the last price of each simulation path which has 252 price points. Unfortunately I'm not able to add up the values in the Callpayoffs list in any way so I can take the average Callpayoff over the 10 simulations. Would really appreciate any help - the below is a sample of print(sum(Callpayoffs). As you can see the code only divides the last value by 10, the number of simulations
[0]
[0]
[0]
[16.651081469090343]
[14.076846993975735]
[9.483857458061152]
[5.357562042338017]
[6.09266787737144]
[0]
[27.85935401436157]
2.785935401436157 # this is the last value divided by no of
simulations, but should be the sum of all values above divided by
simulations
import numpy as np
import pandas as pd
from math import *
import matplotlib.pyplot as plt
from matplotlib import *
def Generate_asset_price(S,v,r,dt):
return (1 + r * dt + v * sqrt(dt) * np.random.normal(0,1))
# initial values
S = 100
v = 0.2
r = 0.05
T = 1
N = 252 # number of steps
dt = 0.00396825
simulations = 10
for x in range(simulations):
stream = [100]
Callpayoffs = []
t = 0
for n in range(N):
s = stream[t] * Generate_asset_price(S,v,r,dt)
stream.append(s)
t += 1
Callpayoffs.append(max(stream[-1] - S,0))
plt.plot(stream)
print(Callpayoffs)
print(sum(Callpayoffs))
(sum(Callpayoffs)) / float(simulations)
I'm using x = numpy.random.rand(1) to generate a random number between 0 and 1. How do I make it so that x > .5 is 2 times more probable than x < .5?
That's a fitting name!
Just do a little manipulation of the inputs. First set x to be in the range from 0 to 1.5.
x = numpy.random.uniform(1.5)
x has a 2/3 chance of being greater than 0.5 and 1/3 chance being smaller. Then if x is greater than 1.0, subtract .5 from it
if x >= 1.0:
x = x - 0.5
This is overkill for you, but it's good to know an actual method for generating a random number with any probability density function (pdf).
You can do that by subclassing scipy.stat.rv_continuous, provided you do it correctly. You will have to have a normalized pdf (so that its integral is 1). If you don't, numpy will automatically adjust the range for you. In this case, your pdf has a value of 2/3 for x<0.5, and 4/3 for x>0.5, with a support of [0, 1) (support is the interval over which it's nonzero):
import scipy.stats as spst
import numpy as np
import matplotlib.pyplot as plt
import ipdb
def pdf_shape(x, k):
if x < 0.5:
return 2/3.
elif 0.5 <= x and x < 1:
return 4/3.
else:
return 0.
class custom_pdf(spst.rv_continuous):
def _pdf(self, x, k):
return pdf_shape(x, k)
instance = custom_pdf(a=0, b=1)
samps = instance.rvs(k=1, size=10000)
plt.hist(samps, bins=20)
plt.show()
tmp = random()
if tmp < 0.5: tmp = random()
is pretty easy way to do it
ehh I guess this is 3x as likely ... thats what i get for sleeping through that class I guess
from random import random,uniform
def rand1():
tmp = random()
if tmp < 0.5:tmp = random()
return tmp
def rand2():
tmp = uniform(0,1.5)
return tmp if tmp <= 1.0 else tmp-0.5
sample1 = []
sample2 = []
for i in range(10000):
sample1.append(rand1()>=0.5)
sample2.append(rand2()>=0.5)
print sample1.count(True) #~ 75%
print sample2.count(True) #~ 66% <- desired i believe :)
First off, numpy.random.rand(1) doesn't return a value in the [0,1) range (half-open, includes zero but not one), it returns an array of size one, containing values in that range, with the upper end of the range having nothing to do with the argument passed in.
The function you're probably after is the uniform distribution one, numpy.random.uniform() since this will allow an arbitrary upper range.
And, to make the upper half twice as likely is a relatively simple matter.
Take, for example, a random number generator r(n) which returns a uniformly distributed integer in the range [0,n). All you need to do is adjust the values to change the distribution:
x = r(3) # 0, 1 or 2, # 1/3 probability each
if x == 2:
x = 1 # Now either 0 (# 1/3) or 1 (# 2/3)
Now the chances of getting zero are 1/3 while the chances of getting one are 2/3, basically what you're trying to achieve with your floating point values.
So I would simply get a random number in the range [0,1.5), then subtract 0.5 if it's greater than or equal to one.
x = numpy.random.uniform(high=1.5)
if x >= 1: x -= 0.5
Since the original distribution should be even across the [0,1.5) range, the subtraction should make [0.5,1.0) twice as likely (and [1.0,1.5) impossible), while keeping the distribution even within each section ([0,0.5) and [0.5,1)):
[0.0,0.5) [0.5,1.0) [1.0,1.5) before
<---------><---------><--------->
[0.0,0.5) [0.5,1.0) [0.5,1.0) after
You could take a "mixture model" approach where you split the process into two steps: first, decide whether to take option A or B, where B is twice as likely as A; then, if you chose A, return a random number between 0.0 and 0.5, else if you chose B, return one between 0.5 and 1.0.
In the example, the randint randomly returns 0, 1, or 2, so the else case is twice as likely as the if case.
m = numpy.random.randint(3)
if m==0:
x = numpy.random.uniform(0.0, 0.5)
else:
x = numpy.random.uniform(0.5, 1.0)
This is a little more expensive (two random draws instead of one) but it can generalize to more complicated distributions in a fairly straightforward way.
if you want a more fluid randomness, you can just square the output of the random function
(and subtract it from 1 to make x > 0.5 more probable instead of x < 0.5).
x = 1 - sqr(numpy.random.rand(1))