How to calculate risk contribution of assets in Python

How to calculate risk contribution of assets in Python - python

I'm trying to write a block of code that will allow me to identify the risk contribution of assets in a portfolio. The covariance matrix is a 6x6 pandas dataframe.
My code is as follows:
import numpy as np
import pandas as pd
weights = np.array([.1,.2,.05,.25,.1,.3])
data = pd.DataFrame(np.random.randn(1000,6),columns = 'a','b','c','d','e','f'])
covariance = data.cov()
portfolio_variance = (weights*covariance*weights.T)[0,0]
sigma = np.sqrt(portfolio_variance)
marginal_risk = covariance*weights.T
risk_contribution = np.multiply(marginal_risk, weights.T)/sigma
print(risk_contribution)
When I try to run the code I get a KeyError, and if I remove the [0,0] from portfolio_variance I get output that doesn't seem to make sense.
Can somebody point me to my error(s)?

Three problems with your code:
Open your list operator square brackets on line 6:
data = pd.DataFrame(np.random.randn(1000,6),columns = ['a','b','c','d','e','f'])
You're using the two dimensional indexing operator wrong. You can't say [0,0], you have to say [0][0].
And last, because you named the columns, you have to use them when indexing, so it's actually ['a'][0]:
portfolio_variance = (weights*covariance*weights.T)['a'][0]
Final working code:
import numpy as np
import pandas as pd
weights = np.array([.1,.2,.05,.25,.1,.3])
data = pd.DataFrame(np.random.randn(1000,6),columns = ['a','b','c','d','e','f'])
covariance = data.cov()
portfolio_variance = (weights*covariance*weights.T)['a'][0]
sigma = np.sqrt(portfolio_variance)
marginal_risk = covariance*weights.T
risk_contribution = np.multiply(marginal_risk, weights.T)/sigma
print(risk_contribution)

portfolio_variance =(weights*covariance*weights.T)
portfolio_variance should be
portfolio_variance =(weights#covariance#weights.T)
This will provide the portfolio variance, which should be a single number.
same for marginal risk, it should be
marginal_risk = covariance#weights.T

Related

Efficient expanding OLS in pandas

I would like to explore the solutions of performing expanding OLS in pandas (or other libraries that accept DataFrame/Series friendly) efficiently.
Assumming the dataset is large, I am NOT interested in any solutions with a for-loop;
I am looking for solutions about expanding rather than rolling. Rolling functions always require a fixed window while expanding uses a variable window (starting from beginning);
Please do not suggest pandas.stats.ols.MovingOLS because it is deprecated;
Please do not suggest other deprecated methods such as expanding_mean.
For example, there is a DataFrame df with two columns X and y. To make it simpler, let's just calculate beta.
Currently, I am thinking about something like
import numpy as np
import pandas as pd
import statsmodels.api as sm
def my_OLS_func(df, y_name, X_name):
y = df[y_name]
X = df[X_name]
X = sm.add_constant(X)
b = np.linalg.pinv(X.T.dot(X)).dot(X.T).dot(y)
return b
df = pd.DataFrame({'X':[1,2.5,3], 'y':[4,5,6.3]})
df['beta'] = df.expanding().apply(my_OLS_func, args = ('y', 'X'))
Expected values of df['beta'] are 0 (or NaN), 0.66666667, and 1.038462.
However, this method does not seem to work because the method seems very inflexible. I am not sure how one could pass the two Series as arguments.
Any suggestions would be appreciated.

One option is to use the RecursiveLS (recursive least squares) model from Statsmodels:
# Simulate some data
rs = np.random.RandomState(seed=12345)
nobs = 100000
beta = [10., -0.2]
sigma2 = 2.5
exog = sm.add_constant(rs.uniform(size=nobs))
eps = rs.normal(scale=sigma2**0.5, size=nobs)
endog = np.dot(exog, beta) + eps
# Construct and fit the recursive least squares model
mod = sm.RecursiveLS(endog, exog)
res = mod.fit()
# This is a 2 x 100,000 numpy array with the regression coefficients
# that would be estimated when using data from the beginning of the
# sample to each point. You should usually ignore the first k=2
# datapoints since they are controlled by a diffuse prior.
res.recursive_coefficients.filtered

getting linear models fama macbeth function output

I am having an issue with this function. I am wanting to perform a cross-sectional regression on 25 portfolios ranked on value and size. I have 7 independent variables as the right side of the equation.
import pandas as pd
import numpy as np
from linearmodels import FamaMacBeth
#creating a multi_index of independent variables
ind_var = pd.read_excel('FAMA_MACBETH.xlsx')
ind_var['date'] = pd.to_datetime(ind_var['date'])
# dropping our dependent variables
ind_var = ind_var.drop(['Mkt_rf', 'div_innovations', 'term_innovations',
'def_innovations', 'rf_innovations', 'hml_innovations',
'smb_innovations'],axis = 1)
ind_var = pd.DataFrame(ind_var.set_index('date').stack())
ind_var.columns = ['x']
x = np.asarray(ind_var)
len(x)
11600
#creatiing a multi_index of dependent variables
# reading in our data
dep_var = pd.read_excel('FAMA_MACBETH.xlsx')
dep_var['date'] = pd.to_datetime(dep_var['date'])
# dropping our independent variables
dep_var = dep_var.drop(['SMALL_LoBM', 'ME1_BM2', 'ME1_BM3', 'ME1_BM4',
'SMALL_HiBM', 'ME2_BM1', 'ME2_BM2', 'ME2_BM3', 'ME2_BM4', 'ME2_BM5',
'ME3_BM1', 'ME3_BM2', 'ME3_BM3', 'ME3_BM4', 'ME3_BM5', 'ME4_BM1',
'ME4_BM2', 'ME4_BM3', 'ME4_BM4', 'ME4_BM5', 'BIG_LoBM', 'ME5_BM2',
'ME5_BM3', 'ME5_BM4', 'BIG_HiBM'],axis = 1)
dep_var = pd.DataFrame(dep_var.set_index('date').stack())
dep_var.columns = ['y']
y = np.asarray(dep_var)
len(y)
3248
mod = FamaMacBeth(y, x)
res = mod.fit(cov_type='kernel', kernel='Parzen')
output with tstats and errors ideally
I have tried numerous methods of getting this to work. I am really thinking of using SAS at this point. Really, I would prefer to get this running with pandas
I expect a cross-sectional regression output with standard errors and t stats

I got it to work in one go. See this site and run the lines of code for OLS below: "Here the difference is presented using the canonical Grunfeld data on investment."
(Note that this line is important: etdata = data.set_index(['firm','year']), else Python won't know the correct dimensions to run F&McB on.)
Then run:
from linearmodels import FamaMacBeth
FamaMacBeth(etdata.invest,etdata[['value','capital']]).fit()
Note, I updated linearmodels to the latest version, that got me access to the data.

Multiple Linear Regression using Python

Firstly, there are a few topics on this but they involve deprecated packages with pandas etc. Suppose I'm trying to predict a variable w with variables x,y and z. I want to run a multiple linear regression to try and predict w. There are quite a few solutions that will produce the coefficients but I'm not sure how to use these. So, in pseudocode;
import numpy as np
from scipy import stats
w = np.array((1,2,3,4,5,6,7,8,9,10)) # Time series I'm trying to predict
x = np.array((1,3,6,1,4,6,8,9,2,2)) # The three variables to predict w
y = np.array((2,7,6,1,5,6,3,9,5,7))
z = np.array((1,3,4,7,4,8,5,1,8,2))
def model(w,x,y,z):
# do something!
return guess # where guess is some 10 element array formed
# using multiple linear regression of x,y,z
guess = model(w,x,y,z)
r = stats.pearsonr(w,guess) # To see how good guess is
Hopefully this makes sense as I'm new to MLR. There is probably a package in scipy that does all this so any help welcome!

You can use the normal equation method.
Let your equation be of the form : ax+by+cz +d =w
Then
import numpy as np
x = np.asarray([[1,3,6,1,4,6,8,9,2,2],
[2,7,6,1,5,6,3,9,5,7],
[1,3,4,7,4,8,5,1,8,2],
[1,1,1,1,1,1,1,1,1,1]]).T
y = numpy.asarray([1,2,3,4,5,6,7,8,9,10]).T
a,b,c,d = np.linalg.pinv((x.T).dot(x)).dot(x.T.dot(y))

Think I've found out now. If anyone could confirm that this produces the correct results that'd be great!
import numpy as np
from scipy import stats
# What I'm trying to predict
y = [-6,-5,-10,-5,-8,-3,-6,-8,-8]
# Array that stores two predictors in columns
x = np.array([[-4.95,-4.55],[-10.96,-1.08],[-6.52,-0.81],[-7.01,-4.46],[-11.54,-5.87],[-4.52,-11.64],[-3.36,-7.45],[-2.36,-7.33],[-7.65,-10.03]])
# Fit linear least squares and get regression coefficients
beta_hat = np.linalg.lstsq(x,y)[0]
print(beta_hat)
# To store my best guess
estimate = np.zeros((9))
for i in range(0,9):
# y = x1b1 + x2b2
estimate[i] = beta_hat[0]*x[i,0]+beta_hat[1]*x[i,1]
# Correlation between best guess and real values
print(stats.pearsonr(estimate,y))

how to calculate total statistical distance in python

In this link total variation distance between two probability distribution is given.
I tried to calculate it in python. I have two datasets and firstly I calculated their probability distribution functions from histograms. Then I tried to get max differences of between two distributions. But it returns me very small values. It seems that I am doing a mistake in it. Can you please help to fix it?
import scipy.stats as st
#original data has shape of [45222,1] and it is numpy array
#synthetic data has shape of [45222,1] and it is numpy array
summation = 0
minOriginal = min(original)
minGenerated = min(synthetic)
maxOriginal = max(original)
maxGenerated = max(synthetic)
minHist = min(minOriginal, minGenerated)
maxHist = max(maxOriginal, maxGenerated)
originalHist = np.histogram(original, range=(minHist, maxHist))
hist_dist1 = st.rv_histogram(originalHist)
generatedHist = np.histogram(synthetic, range=(minHist, maxHist))
hist_dist2 = st.rv_histogram(generatedHist)
x = np.linspace(minHist, maxHist, 45000)
summation += max(abs(hist_dist1.pdf(x)-hist_dist2.pdf(x)))

perturbation analysis in pandas

For a given Series I want to change the value of each element around it's current value and then calculate an arbitrary function (here std) as shown in the following code:
import pandas as pd
import numpy as np
a = pd.Series(np.random.randn(10))
perturb = {}
for item in range(2,len(a)):
serturb = {}
for ep in np.arange(-1,1,0.1):
temp = a.ix[0:item]
temp.iloc[-1] += ep
serturb[ep] = temp.std()
perturb[item] = pd.Series(serturb)
perturb = pd.DataFrame(perturb).T
The above code will become too slow for a large amount of data. The above process, when applied on a DataFrame would return a Panel. Is there an efficient way of doing it, since a lot of the calculations are being repeated?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to calculate risk contribution of assets in Python - python

portfolio_variance =(weightscovarianceweights.T) portfolio_variance should be portfolio_variance =(weights#covariance#weights.T) This will provide the portfolio variance, which should be a single number. same for marginal risk, it should be marginal_risk = covariance#weights.T

Related

Efficient expanding OLS in pandas

getting linear models fama macbeth function output

Multiple Linear Regression using Python

how to calculate total statistical distance in python

perturbation analysis in pandas

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to calculate risk contribution of assets in Python - python

portfolio_variance =(weights*covariance*weights.T) portfolio_variance should be portfolio_variance =(weights#covariance#weights.T) This will provide the portfolio variance, which should be a single number. same for marginal risk, it should be marginal_risk = covariance#weights.T

Related

Efficient expanding OLS in pandas

getting linear models fama macbeth function output

Multiple Linear Regression using Python

how to calculate total statistical distance in python

perturbation analysis in pandas

Categories

Resources

portfolio_variance =(weightscovarianceweights.T) portfolio_variance should be portfolio_variance =(weights#covariance#weights.T) This will provide the portfolio variance, which should be a single number. same for marginal risk, it should be marginal_risk = covariance#weights.T