cvxpy.error.DCPError: Problem does not follow DCP rules - python

Getting my head around the DCP rules. I am looking at the portfolio optimisation example provided on the CVXPY website (see below original codes). Had a look at some of the other queries that deals with DCP rules but couldn't get the answer I wanted.
I tried replacing the Sigma (i.e. covariance) in their code (which is randomly generated) with cov generated from some historic returns for some of the asset classes. Everything else is the same.
Yet I get cvxpy.error.DCPError: Problem does not follow DCP rules.
I have also added pics of the two Sigmas (one generated randomly by the CVXPY code and the other Sigma(1) is the historic cov array I use)
Both are 9*9 arrays, but as I mentioned replacing the randomly generated array with an array with historic numbers gives me that error, all other codes kept the same. Any idea what's causing this issue?
# Generate data for long only portfolio optimization.
import numpy as np
import pandas as pd
import matplotlib
matplotlib.use('TkAgg')
import matplotlib.pyplot as plt
np.random.seed(1)
n = 10
mu = np.abs(np.random.randn(n, 1))
Sigma = np.random.randn(n, n)
Sigma = Sigma.T.dot(Sigma)
# Long only portfolio optimization.
import cvxpy as cp
w = cp.Variable(n)
gamma = cp.Parameter(nonneg=True)
ret = mu.T*w
risk = cp.quad_form(w, Sigma)
prob = cp.Problem(cp.Maximize(ret - gamma*risk),
[cp.sum(w) == 1,
w >= 0])
# Compute trade-off curve.
SAMPLES = 100
risk_data = np.zeros(SAMPLES)
ret_data = np.zeros(SAMPLES)
gamma_vals = np.logspace(-2, 3, num=SAMPLES)
for i in range(SAMPLES):
gamma.value = gamma_vals[i]
prob.solve()
risk_data[i] = cp.sqrt(risk).value
ret_data[i] = ret.value
# Plot long only trade-off curve.
import matplotlib.pyplot as plt
#%matplotlib inline
#%config InlineBackend.figure_format = 'svg'
markers_on = [29, 40]
fig = plt.figure()
ax = fig.add_subplot(111)
plt.plot(risk_data, ret_data, 'g-')
for marker in markers_on:
plt.plot(risk_data[marker], ret_data[marker], 'bs')
ax.annotate(r"$\gamma = %.2f$" % gamma_vals[marker], xy=(risk_data[marker]+.08, ret_data[marker]-.03))
for i in range(n):
plt.plot(cp.sqrt(Sigma[i,i]).value, mu[i], 'ro')
plt.xlabel('Standard deviation')
plt.ylabel('Return')
plt.show()

Observation: it is possible that the variance-covariance matrix in a portfolio optimization is not positive semi-definite. In theory the convariance matrix can be shown to be positive semi-definite. However, due to floating-point rounding errors, we may actually see (slightly) negative eigenvalues. (Note: a positive semi-definite matrix has non-negative eigenvalues).
I know of three approaches to handle this:
Use a statistical technique called shrinkage,
Perturb the diagonal a bit by adding small constant to each diagonal element,
Use a variant of the standard portfolio model based on mean adjusted returns.
For details see link.

Related

Using scipy.fftpack.fft how to interprete numerical result of Fourier Transform

The analytical Fourier transform of a sinusoidal signal is purely imginary. However, when numerically computing discrete Fourier transform, the result is not.
Tldr: Find all answers to this question here.
Consider therefore the following code
import matplotlib.pyplot as plt
import numpy as np
from scipy.fftpack import fft, fftfreq
f_s = 200 # Sampling rate = number of measurements per second in [Hz]
t = np.arange(0,10000, 1 / f_s)
N = len(t)
A = 4 # Amplitude of sinus signal
x = A * np.sin(t)
X = fft(x)[1:N//2]
freqs = (fftfreq(len(x)) * f_s)[1:N//2]
fig, (ax1,ax2) = plt.subplots(2,1, sharex = True)
ax1.plot(freqs, X.real, label = "$\Re[X(\omega)]$")
ax1.plot(freqs, X.imag, label = "$\Im[X(\omega)]$")
ax1.set_title("Discrete Fourier Transform of $x(t) = A \cdot \sin(t)$")
ax1.legend()
ax1.grid(True)
ax2.plot(freqs, np.abs(X), label = "$|X(\omega)|$")
ax2.legend()
ax2.set_xlabel("Frequency $\omega$")
ax2.set_yscale("log")
ax2.grid(True, which = "both")
ax2.set_xlim(0.15,0.175)
plt.show()
Clearly, the absolute value |X(w)| can be used as good approximation to the analytical result. However, the imaginary and real value of the function X(w) are different. Already another question on SO mentioned this fact, but did not explain why. So I can only use the absolute value and the phase?
Another question would be how the Amplitude is related to the numerical result. Mathematically speaking it should be the integral under the curve of |X(w)| divided by normalization (which, as far as I understood, should be given by N), i.e. approximately by
A_approx = np.sum(np.abs(X)) / N
print(f"Numerical value: {A_approx:.1f}, Correct value: {A:.1f}")
Numerical value: 13.5, Correct value: 4.0
This does not seem to be the case. Any insights? Ideas?
Related questions which did not help are here and here.
An FFT does not produce the result you expect because it is finite in length, and thus more similar to the Fourier Transform of a rectangular window on your sinusoid. The length and placement of this rectangular window will affect the phase and amplitude of the FFT result.

FFT normalization with numpy

Just started working with numpy package and started it with the simple task to compute the FFT of the input signal. Here's the code:
import numpy as np
import matplotlib.pyplot as plt
#Some constants
L = 128
p = 2
X = 20
x = np.arange(-X/2,X/2,X/L)
fft_x = np.linspace(0,128,128, True)
fwhl = 1
fwhl_y = (2/fwhl) \
*(np.log([2])/np.pi)**0.5*np.e**(-(4*np.log([2]) \
*x**2)/fwhl**2)
fft_fwhl = np.fft.fft(fwhl_y, norm='ortho')
ampl_fft_fwhl = np.abs(fft_fwhl)
plt.bar(fft_x, ampl_fft_fwhl, width=.7, color='b')
plt.show()
Since I work with an exponential function with some constant divided by pi before it, I expect to get the exponential function in Fourier space, where the constant part of the FFT is always equal to 1 (zero frequency).
But the value of that component I get using numpy is larger (it's about 1,13). Here I have an amplitude spectrum which is normalized by 1/(number_of_counts)**0.5 (that's what I read in numpy documentation). I can't understand what's wrong... Can anybody help me?
Thanks!
[EDITED] It seems like the problem is solved, all you need to get the same result of Fourier integral and of FFT is to multiply FFT by the step (in my case it's X/L). And as for normalization as option of numpy.fft.fft(..., norm='ortho'), it's used only to save the scale of the transform, otherwise you'll need to divide the result of the inverse FFT by the number of samples. Thanks everyone for their help!
I've finally solved my problem. All you need to bond FFT with Fourier integral is to multiply the result of the transform (FFT) by the step (X/L in my case, FFTX/L), it works in general. In my case it's a bit more complex since I have an extra rule for the function to be transformed. I have to be sure that the area under the curve is equal to 1, because it's a model of δ function, so since the step is unchangeable, I have to fulfill stepsum(fwhl_y)=1 condition, that is X/L=1/sum(fwhl_y). So to get the correct result I have to make following things:
to calculate FFT fft_fwhl = np.fft.fft(fwhl_y)
to get rid of phase component which comes due to the symmetry of fwhl_y function, that is the function defined in [-T/2,T/2] interval, where T is period and np.fft.fft operation thinks that my function is defined in [0,T] interval. So to get amplitude spectrum only (that's what I need) I simply use np.abs(FFT)
to get the values I expect I should multiply the result I got on previous step by X/L, that is np.abs(FFT)*X/L
I have an extra condition on the area under the curve, so it's X/L*sum(fwhl_y)=1 and I finally come to np.abs(FFT)*X/L = np.abs(FFT)/sum(fwhl_y)
Hope it'll help anyone at least.
Here's a possible solution to your problem:
import numpy as np
import matplotlib.pyplot as plt
from scipy import fft
from numpy import log, pi, e
# Signal setup
Fs = 150
Ts = 1.0 / Fs
t = np.arange(0, 1, Ts)
ff = 50
fwhl = 1
y = (2 / fwhl) * (log([2]) / pi)**0.5 * e**(-(4 * log([2]) * t**2) / fwhl**2)
# Plot original signal
plt.subplot(2, 1, 1)
plt.plot(t, y, 'k-')
plt.xlabel('time')
plt.ylabel('amplitude')
# Normalized FFT
plt.subplot(2, 1, 2)
n = len(y)
k = np.arange(n)
T = n / Fs
frq = k / T
freq = frq[range(n / 2)]
Y = np.fft.fft(y) / n
Y = Y[range(n / 2)]
plt.plot(freq, abs(Y), 'r-')
plt.xlabel('freq (Hz)')
plt.ylabel('|Y(freq)|')
plt.show()
With fwhl=1:
With fwhl=0.1:
You can see in the above graphs how the exponential & FFT plots varies when fwhl is close to 0

Plotting only one side of gaussian in Python using matplotlib and scipy

I have a set of points in the first quadrant that look like a gaussian, and I am trying to fit it using a gaussian in python and my code is as follows:
import pylab as plb
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy import asarray as ar,exp
import math
x=ar([37,69,157,238,274,319,391,495,533,626,1366,1855,2821,3615,4130,4374,6453,6863,7021,
7951,8646,9656,10464,11400])
y=ar([1.77,1.67,1.65,1.17,1.34,1.46,0.75,1,0.8,1.02,0.65,0.69,0.44,0.44,0.55,0.43,0.75,0.27,0.26,
0.44,0.04,0.44,0.26,0.04])
n = 24 #the number of data
mean = sum(x*y)/n #note this correction
sigma = math.sqrt(sum(y*(x-mean)**2)/n) #note this correction
def gaus(x,a,x0,sigma):
return a*exp(-(x-x0)**2/(2*sigma**2))
popt,pcov = curve_fit(gaus,x,y,p0=None, sigma=None) #'''p0=[1,mean,sigma]'''
plt.plot(x,y,'b+:',label='data')
plt.plot(x,gaus(x,*popt),'ro:',label='fit')
plt.legend()
plt.title('Fig. 3 - Fit for Time Constant')
plt.xlabel('Time (s)')
plt.ylabel('Voltage (V)')
plt.show()
And the output is: this figure:
http://s2.postimg.org/wevggkc95/Workspace_1_022.png
Why are all the red points coming below, Also note that I am interested in a half gaussian as my data is like that, so my y values are big at first and then decreasing like one side of the gaussian bell. Can anyone tell me how to fit this curve in python, (in case it cannot be fit to gaussian). Or in other words, I want code to fit the half(left side) gaussian of my points (in the first quadrant only). Note that my points cannot be fit as an exponentially decreasing curve as I tried that earlier, and it is not fitting well at lower 'x' values.
Apparently your data do not fit well or easily to a Gaussian function. You use the default initial guesses for p0 = [1,1,1] which is so far away from any kind of optimal choice that curve_fit gives up before it gets started (check the values of popt=[1,1,1] and pcov=[inf, inf, inf]). You could try with better guesses (e.g. p0 = [2,0, 2000]), but on my system it won't converge: Optimal parameters not found: Number of calls to function has reached maxfev = 800.
To fit a "half-Gaussian", don't float the centre position x0 (just leave it equal to 0):
def gaus(x,a,sigma):
return a*exp(-(x)**2/(2*sigma**2))
p0 = [1.2, 4000]
popt,pcov = curve_fit(gaus,x,y,p0=p0)
Unless you have a particular reason for wanting to fit a Gaussian, why not do a more robust linear least squares fit to a polynomial, e.g.:
pfit = np.polyfit(x, y, 3)
poly = np.poly1d(pfit)

Bayesian Covariance Prediction with PyMC

I'm trying to use pyMC to provide a Bayesian estimate of a covariance matrix given some data. I'm roughly following the stock covariance example provided in this online guide (link here), but I have a more simplistic example model that I made up. I've got two values that I draw from a multivariate normal distribution, and I've constructed it in such a way that I know the covariance/correlation between the two variables.
I've posted my short code below. Essentially what I'm doing is constructing an artificial data set where the correlation matrix should be [[1, -0.5], [-0.5, 1]]. At the end of the mcmc sampling, I get a predicted value for the off-diagonal term that is quite a bit different. I've looked at the convergence criteria, and it looks like the autocorrelation is low and the distribution is stationary. However, I will admit I'm still wrapping my head around all the nuances here and there could be aspects of this that are still beyond my grasp.
This question is related to and very much based on these other two SO questions (One and Two). I felt the need to ask my own question despite the similarity because I'm not getting the answer I expect to get. If any of you computational statisticians out there can help provide insight into this problem it would be greatly appreciated!
import numpy as np
import pandas as pd
import pymc as pm
import matplotlib.pyplot as plt
import seaborn as sns
p=2
prior_mu=np.ones(p)
prior_sdev=np.ones(p)
prior_corr_inv=np.eye(p)
def cov2corr(A):
"""
covariance matrix to correlation matrix.
"""
d = np.sqrt(A.diagonal())
A = ((A.T / d).T) / d
#A[ np.diag_indices(A.shape[0]) ] = np.ones( A.shape[0] )
return A
# construct artificial data set
muVector=[10,5]
sdevVector=[3.,5.]
corrMatrix=np.matrix([[1,-0.5],[-0.5, 1]])
cov_matrix=np.diag(sdevVector)*corrMatrix*np.diag(sdevVector)
n_obs = 500
x = np.random.multivariate_normal(muVector,cov_matrix,n_obs)
prior_mu = np.array(muVector)
prior_std = np.array(sdevVector)
inv_cov_matrix = pm.Wishart( "inv_cov_matrix", n_obs, np.diag(prior_std**2) )
mu = pm.Normal( "returns", prior_mu, 1, size = 2)
# create the model and sample
obs = pm.MvNormal( "observed returns", mu, inv_cov_matrix, observed = True, value = x )
model = pm.Model( [obs, mu, inv_cov_matrix] )
mcmc = pm.MCMC(model)
mcmc.use_step_method(pm.AdaptiveMetropolis,inv_cov_matrix)
mcmc.sample( 1e5, 2e4, 10)
# Determine prediction - Does not equal corrMatrix!
inv_cov_samples = mcmc.trace("inv_cov_matrix")[:]
mean_covariance_matrix = np.linalg.inv( inv_cov_samples.mean(axis=0) )
prediction = cov2corr(mean_covariance_matrix*n_obs)

Latin hypercube sampling with python

I would like to sample a distribution defined by a function in multiple dimensions (2,3,4):
f(x, y, ...) = ...
The distributions might be ugly, non standard (like a 3D spline on data, sum of gaussians ect.). To this end I would like to uniformly sample the 2..4 dimensional space, and than with an additional random number accept or reject the given point of the space into my sample.
Is there a ready to use python lib for this purpose?
Is there python lib for generating the points in this 2..4 dimensional space with latin hypercube sampling, or with other uniform sampling method? Bruteforce sampling with independent random numbers usually results in more and less dense regimes of the space.
if 1) and 2) doesn't exist, is there anybody who is kind enough to share his implementation for the same or similar problem.
I'll use it in a python code, but links to other solutions are also acknowledged.
I guess this is a late answer, but this is also for future visitors. I have just put up an implementation of multi-dimensional uniform Latin Hypercube sampling on git. It is minimal, but very easy to use. You can generate uniform random variables sampled in n dimensions using Latin Hypercube Sampling, if your variables are independent. Below is an example plot comparing Monte Carlo and Latin Hypercube Sampling with Multi-dimensional Uniformity (LHS-MDU) in two dimensions with zero correlation.
import lhsmdu
import matplotlib.pyplot as plt
import numpy
l = lhsmdu.sample(2,10) # Latin Hypercube Sampling of two variables, and 10 samples each.
k = lhsmdu.createRandomStandardUniformMatrix(2,10) # Monte Carlo Sampling
fig = plt.figure()
ax = fig.gca()
ax.set_xticks(numpy.arange(0,1,0.1))
ax.set_yticks(numpy.arange(0,1,0.1))
plt.scatter(k[0], k[1], color="b", label="LHS-MDU")
plt.scatter(l[0], l[1], color="r", label="MC")
plt.grid()
plt.show()
Now the pyDOE library provides a tool to generate Latin-hypercube-based samples.
https://pythonhosted.org/pyDOE/randomized.html
to generate samples over n dimensions:
lhs(n, [samples, criterion, iterations])
where n is the number of dimensions, samples as the total number of the sample space.
Here is an update of Sahil M's answer for Python 3 (update from Python 2 to Python 3 and some minor code changes to match code and figure):
import lhsmdu
import matplotlib.pyplot as plt
import numpy
l = lhsmdu.sample(2,10) # Latin Hypercube Sampling of two variables, and 10 samples each.
k = lhsmdu.createRandomStandardUniformMatrix(2,10) # Monte Carlo Sampling
fig = plt.figure()
ax = fig.gca()
ax.set_xticks(numpy.arange(0,1,0.1))
ax.set_yticks(numpy.arange(0,1,0.1))
plt.scatter([k[0]], [k[1]], color="r", label="MC")
plt.scatter([l[0]], [l[1]], color="b", label="LHS-MDU")
plt.legend()
plt.grid()
plt.show()
I once encountered a Python memory error running this script. Any suggestions why this could happen or how to changes the script so it doesn't happen anymore in the future?
Latin Hypercube sampling is now part of SciPy since version 1.7. See the doc.
from scipy.stats.qmc import LatinHypercube
engine = LatinHypercube(d=2)
sample = engine.random(n=100)
It support centering, strength and optimization.
This 2-D example samples uniformly on two dimensions, chooses each point with constant probability (thus keeping a binomially distributed number of points), selects randomly and without replacement those points from the sample space, and generates a pair of vectors which you can then pass through to your function f:
import numpy as np
import random
resolution = 10
keepprob = 0.5
min1, max1 = 0., 1.
min2, max2 = 3., 11.
keepnumber = np.random.binomial(resolution * resolution, keepprob,1)
array1,array2 = np.meshgrid(np.linspace(min1,max1,resolution),np.linspace(min2,max2,resolution))
randominixes = random.sample(list(range(resolution * resolution)), int(keepnumber))
randominixes.sort()
vec1Sampled,vec2Sampled = array1.flatten()[randominixes],array2.flatten()[randominixes]

Categories

Resources