Calculate incremental weighted mean and covariance in python - python

I've been trying to implement an incremental weighted mean and covariance calculator. So given a data matrix X of size (N,D) and weights of size (N,), I need to be able to get its weighted statistics (mean vector of size (D,)) and covariance matrix of size (D,D)) by running update in the class below
I've gotten the weighted mean part but I'm unsure how to proceed with calculating the covariance matrix in this manner.
import numpy as np
class OnlineWeightStats:
def __init__(self, X):
N, D = X.shape
self.data_mat = X
self.mean = np.zeros(D)
self.wsum = 1e-7
def update(self, weights=None):
if weights is None:
weights = np.ones(self.data_mat.shape[0])
for d, w in zip(self.data_mat,weights):
self.wsum += w
upcon = (w / self.wsum)
delta = d - self.mean
self.mean += delta * upcon
if __name__ == "__main__":
N = 10000
D = 2
X = np.random.rand(N,D)
# Define OnlineStats
ows = OnlineWeightStats(X)
# Update 1
ows.update() # No weights, so just calculates standard mean
# Change weights and update again
new_weights = np.random.uniform(0,1,N)
ows.update(new_weights)
# Change weights and update
......

Got the solution. I'm thinking this is the best that Python (with numpy) can do unless we count writing it as a C/C++ extension.
import numpy as np
class OnlineWeightStats:
def __init__(self, X):
N, D = X.shape
self.data_mat = X
self.mean = np.zeros(D)
self.wsum = 1e-7
self.xpsum = np.zeros((D,D)) # Sum of cross-products
def update(self, weights=None):
if weights is None:
weights = np.ones(self.data_mat.shape[0])
for d, w in zip(self.data_mat,weights):
self.wsum += w
upcon = (w / self.wsum)
delta = d - self.mean
# Weighted mean computation
self.mean += delta * upcon
# Weighted covariance computation
self.xpsum += np.outer(delta, delta) * w * (1-upcon)
self.cov = self.xpsum/self.wsum

Related

Singular Covariance Matrix in Expectation Maximization

I was trying to code up an EM algorithm in python for clustering images of different types. My understanding of the EM algorithm is as follows:
Accordingly, I coded the same in python. Here's my code:
import numpy as np
import sys
from scipy import stats
# μ: mean vector ∈ R^(self.m x self.n)
# Σ: covariance matrix ∈ R^(self.m x self.n x self.n)
# π: probabilities of each component
class gmm:
def __init__(self,n_components):
self.m = n_components
self.π = np.random.random((self.m,))
self.x = None
self.Σ = None
self.μ = None
self.r = None
self.n = None # length of data
#classmethod
def N(cls, x, μ, Σ):
#print(Σ)
#print('\n')
return stats.multivariate_normal(μ, Σ).pdf(x)
def E_step(self):
for i,x_i in enumerate(self.x):
den = 0
for c in range(self.m):
#print(self.Σ[c].shape)
#print(self.μ[c].shape)
#sys.exit()
den+= self.π[c]*gmm.N(x_i,self.μ[c],self.Σ[c])
#print(f'{den=}')
for c in range(self.m):
num = self.π[c]*gmm.N(x_i,self.μ[c],self.Σ[c])
self.r[i,c] = num/den
print(f'{self.r}\n')
def M_step(self):
m_c = np.sum(self.r, axis = 0)
self.π = m_c/self.m
for c in range(self.m):
s1 = 0
s2 = 0
for i in range(self.n):
s1+= self.r[i,c]*self.x[i]
s2+= self.r[i,c]*(self.x[i]-self.μ[c]).dot(self.x[i]-self.μ[c])
self.μ[c] = s1/m_c[c]
self.Σ[c] = s2/m_c[c]
def fit(self,x, iterations = 10):
self.x = x
self.n = x.shape[0]
self.r = np.empty((self.n, self.m))
self.μ = np.random.random((self.m, self.n))
Sigma = [np.random.random((self.n, self.n)) for i in range(self.m)]
Sigma = [0.5*(s + s.T)+5*np.eye(s.shape[0]) for s in Sigma] # A symmetric diagonally dominant matrix is PD
#print([np.linalg.det(s) for s in Sigma])
self.Σ = np.asarray(Sigma)
for i in range(iterations):
self.E_step()
self.M_step()
def params():
return self.π, self.μ, self.Σ
if __name__ == '__main__':
data_dim = 5 # No. of data dimensions
data = [[]]*6
data[0] = np.random.normal(0.5,2, size = (300,))
data[1] = np.random.normal(3,2, size = (300,))
data[2] = np.random.normal(-1, 0.1, size = (300,))
data[3] = np.random.normal(2,3.14159, size = (300,))
data[4] = np.random.normal(0,1, size = (300,))
data[5] = np.random.normal(3.5,5, size = (300,))
p = [0.1, 0.15, 0.22, 0.3, 0.2, 0.03]
vals = [0,1,2,3,4,5]
combined = []
for i in range(data_dim):
choice = np.random.choice(vals, p = p)
combined.append(np.random.choice(data[choice]))
combined = np.array(combined)
G = gmm(n_components = 6)
G.fit(combined)
pi, mu, sigma = G.params()
print(pi)
print(mu)
print(sigma)
Now comes the problem. While running the code, the covariance matrix Σ becomes singular after some iterations. Specifically, all the entries of Sigma become the same all of a sudden in a particular iteration.
I have tried adding some random noise to Σ while this happens, but this seems to only delay the inevitable.
Any help/comments will be greatly appreciated. Thanks in Advance :)
To prevent the covariance matrix from becoming singular, you could add an arbitrary value along the diagonal of the matrix, i.e.
val * np.identity(size)
as this ensures that the covariance matrix will remain positive definite, and have an inverse. For instance, sklearn uses the default value 1e-6 for their regularization.

Pytorch: multiplication between parameters is inplace for LBFGS optimizer?

I am trying to solve a kind of inverse problem by backward propagation with pytorch. I am trying to recover the parameters (r, theta) that generate a vector field U(r,theta).
As I intended to use the LBFGS optimizer from pytorch, I realize that the operation
r*theta
is detected as inplace and thus not supported for the backward computation of the gradient, whereas
r+theta is not.
How can I overcome this ? I actually need to recover fields that use transformations of the form r*theta.
Here is an example of a code that reproduces the error: it is running fine if you change
field = Wrong_U_param(r, theta, positions)
by
field = U_param(r, theta, positions)
in the loop. Is also works if you replace the r*theta operation by r.item()*theta (but is does not optimize over r since there is no more gradient depending on r.
I tried to use torch.mul() to run the product but it also fails.
The error message is the following
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation
and the automatic detection points towards this very product.
Thank you for your help !
import numpy as np
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
import torch.optim as optim
from geomloss import SamplesLoss
torch.autograd.set_detect_anomaly(True)
def model(field):
return field
def U_param(r, theta, pos):
result = r + theta + 0. * pos
return result
def Wrong_U_param(r, theta, pos):
result = r * theta + 0. * pos
return result
def learn_U_param(Zobs, ngrad, params, r_guess=0., theta_guess=0., lambd=1.):
Npts = params[0]
positions = torch.tensor(np.arange(0, 1, 1 / Npts) + 1 / 2 / Npts).reshape((Npts, 1))
lab = torch.tensor(np.arange(0, Npts))
r = torch.tensor(float(r_guess)).to(device)
r.requires_grad = True
theta = torch.tensor(float(theta_guess)).to(device)
theta.requires_grad = True
r_hist = [r.item()]
theta_hist = [theta.item()]
loss_hist = []
optimizer = optim.LBFGS([r, theta])
for i in range(ngrad):
field = Wrong_U_param(r, theta, positions)
Z = model(field)
Loss = SamplesLoss(loss="sinkhorn", p=2, blur=.05)
Wass = Loss(lab, Z, positions, lab, Zobs, positions)
def closure():
optimizer.zero_grad()
Wass.backward(retain_graph=True)
return Wass
optimizer.step(closure)
optimizer.zero_grad()
r_hist.append(r.item())
theta_hist.append(theta.item())
loss_hist.append(Wass.item())
return r_hist, theta_hist, loss_hist
N=100
r = 2
theta = 2
params = [N]
positions = torch.tensor(np.arange(0, 1, 1 / N) + 1 / 2 / N).reshape((N, 1))
Zobs = U_param(r, theta, positions)
ngrad = 10
print(learn_U_param(Zobs, ngrad, params, r_guess=0.1, theta_guess=0.1, lambd=1.))

Get an X(std) value of efficient frontier from a given Y(mean) value with cvxopt?

I'm trying to do portfolio optimization with cvxopt (Python), I'm able to get the efficient frontier with the following code, however, I'm not able to specify a Y value (mean or return) and get a corresponding X value (std or risk), if anyone has knowledge about this, I would be more than grateful if you can share it:
def optimal_portfolio(returns_vec, cov_matrix):
n = len(returns_vec)
N = 1000
mus = [10 ** (5 * t / N - 1.0) for t in range(N)]
# Convert to cvxopt matrices
S = opt.matrix(cov_matrix)
pbar = opt.matrix(returns_vec)
# Create constraint matrices
G = -opt.matrix(np.eye(n)) # negative n x n identity matrix
h = opt.matrix(0.0, (n, 1))
A = opt.matrix(1.0, (1, n))
b = opt.matrix(1.0)
#solvers.options["feastol"] =1e-9
# Calculate efficient frontier weights using quadratic programming
portfolios = [solvers.qp(mu * S, -pbar, G, h, A, b)['x']
for mu in mus]
## CALCULATE RISKS AND RETURNS FOR FRONTIER
returns = [blas.dot(pbar, x) for x in portfolios]
risks = [np.sqrt(blas.dot(x, S * x)) for x in portfolios]
## CALCULATE THE 2ND DEGREE POLYNOMIAL OF THE FRONTIER CURVE
m1 = np.polyfit(returns, risks, 2)
x1 = np.sqrt(m1[2] / m1[0])
# CALCULATE THE OPTIMAL PORTFOLIO
wt = solvers.qp(opt.matrix(x1 * S), -pbar, G, h, A, b)['x']
return np.asarray(wt), returns, risks
return_vec = [0.055355,
0.010748,
0.041505,
0.074884,
0.039795,
0.065079
]
cov_matrix =[[ 0.005329, -0.000572, 0.003320, 0.006792, 0.001580, 0.005316],
[-0.000572, 0.000625, 0.000606, -0.000266, -0.000107, 0.000531],
[0.003320, 0.000606, 0.006610, 0.005421, 0.000990, 0.006852],
[0.006792, -0.000266, 0.005421, 0.011385, 0.002617, 0.009786],
[0.001580, -0.000107, 0.000990, 0.002617, 0.002226, 0.002360],
[0.005316, 0.000531, 0.006852, 0.009786, 0.002360, 0.011215]]
weights, returns, risks = optimal_portfolio(return_vec, cov_matrix)
fig = plt.figure()
plt.ylabel('Return')
plt.xlabel('Risk')
plt.plot(risks, returns, 'y-o')
print(weights)
plt.show()
I would like to find the corresponding risk value for a given return value.
Thank you very much!
Not sure I properly understood your question but if you are looking to find a solution for a given return then you should use the min return required as a constraint
def optimal_portfolio(returns_vec, cov_matrix, Y):
h1 = opt.matrix(0.0, (n, 1))
h2 = np.matrix(-np.ones((1,1))*Y)
h = concatenate (h1,h2)
This will eventually return the optimal portfolio based on a lowerbound Y. You can then add an upper bound in the same way so that you can fix the value of Y

SKlearn Gaussian Process with constant, manually set correlation

I want to use the Gaussian Process approximation for a simple 1D test function to illustrate a few things. I want to iterate over a few different values for the correlation matrix (since this is 1D it is just a single value) and show what effect different values have on the approximation. My understanding is, that "theta" is the parameter for this. Therefore I want to set the theta value manually and don't want any optimization/changes to it. I thought the constant kernel and the clone_with_theta function might get me what I want but I didn't get it to work. Here is what I have so far:
import numpy as np
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF, ConstantKernel as ConstantKernel
def f(x):
"""The function to predict."""
return x/2 + ((1/10 + x) * np.sin(5*x - 1))/(1 + x**2 * (np.sin(x - (1/2))**2))
# ----------------------------------------------------------------------
# Data Points
X = np.atleast_2d(np.delete(np.linspace(-1,1, 7),4)).T
y = f(X).ravel()
# Instantiate a Gaussian Process model
kernel = ConstantKernel(constant_value=1, constant_value_bounds='fixed')
theta = np.array([0.5,0.5])
kernel = kernel.clone_with_theta(theta)
gp = GaussianProcessRegressor(kernel=kernel, optimizer=None)
# Fit to data using Maximum Likelihood Estimation of the parameters
gp.fit(X, y)
# Make the prediction on the meshed x-axis (ask for MSE as well)
y_pred, sigma = gp.predict(x, return_std=True)
# Plot
# ...
I programmed a simple implementation myself now, which allows to set correlation (here 'b') manually:
import numpy as np
from numpy.linalg import inv
def f(x):
"""The function to predict."""
return x/2 + ((1/10 + x) * np.sin(5*x - 1))/(1 + x**2 * (np.sin(x - (1/2))**2))
def kriging_approx(x,xt,yt,b,mu,R_inv):
N = yt.size
one = np.matrix(np.ones((yt.size))).T
r = np.zeros((N))
for i in range(0,N):
r[i]= np.exp(-b * (xt[i]-x)**2)
y = mu + np.matmul(np.matmul(r.T,R_inv),yt - mu*one)
y = y[0,0]
return y
def calc_R (x,b):
N = x.size
# setup R
R = np.zeros((N,N))
for i in range(0,N):
for j in range(0,N):
R[i][j] = np.exp(-b * (x[i]-x[j])**2)
R_inv = inv(R)
return R, R_inv
def calc_mu_sig (yt, R_inv):
N = yt.size
one = np.matrix(np.ones((N))).T
mu = np.matmul(np.matmul(one.T,R_inv),yt) / np.matmul(np.matmul(one.T,R_inv),one)
mu = mu[0,0]
sig2 = (np.matmul(np.matmul((yt - mu*one).T,R_inv),yt - mu*one))/(N)
sig2 = sig2[0,0]
return mu, sig2
# ----------------------------------------------------------------------
# Data Points
xt = np.linspace(-1,1, 7)
yt = np.matrix((f(xt))).T
# Calc R
R, R_inv = calc_R(xt, b)
# Calc mu and sigma
mu_dach, sig_dach2 = calc_mu_sig(yt, R_inv)
# Point to get approximation for
x = 1
y_approx = kriging_approx(x, xt, yt, b, mu_dach, R_inv)

How to get accurate predictions from Neural Network?

I'm doing a project on water quality prediction using Artificial Neural Network. I implemented this using python. I have completed my prediction model but the generated predictions are not much accurate.
What I'm doing is I have collected data from a river for past 4 and half years on daily basis and I'm predicting a pattern for a specific parameter by inputting data from past records. Simply what I need to do is to predict "Turbidity level" of water on 2015 by feeding data on turbidity from 2012-2014.
From the model which I have created it is not much accurate when I compare to the real data I have gathered for 2015. Please help me to solve this. I tried this by changing hidden layer sizes and the Lambda value.
//This is my code
import xlrd
import numpy as np
from numpy import zeros
from scipy.optimize import minimize
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from scipy import optimize
#Neural Network
class Neural_Network(object):
def __init__(self,Lambda):
#Define Hyperparameters
self.inputLayerSize = 2
self.outputLayerSize = 1
self.hiddenLayerSize = 10
#Weights (parameters)
self.W1 = np.random.randn(self.inputLayerSize,self.hiddenLayerSize)
self.W2 = np.random.randn(self.hiddenLayerSize,self.outputLayerSize)
#Regularization Parameter:
self.Lambda = Lambda
def forward(self, arrayInput):
#Propogate inputs though network
self.z2 = np.dot(arrayInput, self.W1)
self.a2 = self.sigmoid(self.z2)
self.z3 = np.dot(self.a2, self.W2)
yHat = self.sigmoid(self.z3)
return yHat
def sigmoid(self, z):
#Apply sigmoid activation function to scalar, vector, or matrix
return 1/(1+np.exp(-z))
def sigmoidPrime(self,z):
#Gradient of sigmoid
return np.exp(-z)/((1+np.exp(-z))**2)
def costFunction(self, arrayInput, arrayOutput):
#Compute cost for given input,output use weights already stored in class.
self.yHat = self.forward(arrayInput)
#J = 0.5*sum((arrayOutput-self.yHat)**2)
#J = 0.5*sum((arrayOutput-self.yHat)**2)/arrayInput.shape[0] + (self.Lambda/2)
J = 0.5*sum((arrayOutput-self.yHat)**2)/arrayInput.shape[0] + (self.Lambda/2)*sum(sum(self.W1**2),sum(self.W2**2))
#J = 0.5*sum((arrayOutput-self.yHat)**2)/arrayInput.shape[0] + (self.Lambda/2)*(sum(self.W1**2)+sum(self.W2**2))
return J
def costFunctionPrime(self, arrayInput, arrayOutput):
#Compute derivative with respect to W and W2 for a given X and y:
self.yHat = self.forward(arrayInput)
delta3 = np.multiply(-(arrayOutput-self.yHat), self.sigmoidPrime(self.z3))
#Add gradient of regularization term:
#dJdW2 = np.dot(self.a2.T, delta3) + self.Lambda*self.W2
dJdW2 = np.dot(self.a2.T, delta3)
delta2 = np.dot(delta3, self.W2.T)*self.sigmoidPrime(self.z2)
#Add gradient of regularization term:
#dJdW1 = np.dot(arrayInput.T, delta2)+ self.Lambda*self.W1
dJdW1 = np.dot(arrayInput.T, delta2)
return dJdW1, dJdW2
#Helper Functions for interacting with other classes:
def getParams(self):
#Get W1 and W2 unrolled into vector:
params = np.concatenate((self.W1.ravel(), self.W2.ravel()))
return params
def setParams(self, params):
#Set W1 and W2 using single paramater vector.
W1_start = 0
W1_end = self.hiddenLayerSize * self.inputLayerSize
self.W1 = np.reshape(params[W1_start:W1_end], (self.inputLayerSize , self.hiddenLayerSize))
W2_end = W1_end + self.hiddenLayerSize*self.outputLayerSize
self.W2 = np.reshape(params[W1_end:W2_end], (self.hiddenLayerSize, self.outputLayerSize))
def computeGradients(self, arrayInput, arrayOutput):
dJdW1, dJdW2 = self.costFunctionPrime(arrayInput, arrayOutput)
return np.concatenate((dJdW1.ravel(), dJdW2.ravel()))
def computeNumericalGradient(self,N, X, y):
paramsInitial = N.getParams()
numgrad = np.zeros(paramsInitial.shape)
perturb = np.zeros(paramsInitial.shape)
e = 1e-4
for p in range(len(paramsInitial)):
#Set perturbation vector
perturb[p] = e
N.setParams(paramsInitial + perturb)
loss2 = N.costFunction(X, y)
N.setParams(paramsInitial - perturb)
loss1 = N.costFunction(X, y)
#Compute Numerical Gradient
numgrad[p] = (loss2 - loss1) / (2*e)
#Return the value we changed to zero:
perturb[p] = 0
#Return Params to original value:
N.setParams(paramsInitial)
return numgrad
#Trainer class
class trainer(object):
def __init__(self, N):
self.N = N
def costFunctionWrapper(self, params, arrayInput, arrayOutput):
self.N.setParams(params)
cost = self.N.costFunction(arrayInput, arrayOutput)
#grad = self.N.computeGradients(arrayInput, arrayOutput)
grad = self.N.computeNumericalGradient(self.N,arrayInput, arrayOutput)
return cost, grad
def callbackF(self, params):
self.N.setParams(params)
self.J.append(self.N.costFunction(self.arrayInput, self.arrayOutput))
self.testJ.append(self.N.costFunction(self.TestInput, self.TestOutput))
def train(self, arrayInput, arrayOutput,TestInput,TestOutput):
#Make an internal variable for the callback function:
self.arrayInput = arrayInput
self.arrayOutput = arrayOutput
self.TestInput = TestInput
self.TestOutput = TestOutput
#Make empty list to store costs:
self.J = []
self.testJ= []
params0 = self.N.getParams()
options = {'maxiter': 200, 'disp' : True}
_res = optimize.minimize(self.costFunctionWrapper, params0, jac=True, method='BFGS', \
args=(arrayInput, arrayOutput), options=options, callback=self.callbackF)
self.N.setParams(_res.x)
self.optimizationResults = _res
#Main Program
path = "F:\prototype\\newdata\\tody\\turbidity\\c.xlsx"
book = xlrd.open_workbook(path)
input1=[]
output=[]
testinput=[]
testoutput=[]
#training data set
first_sheet = book.sheet_by_index(1)
for row in range(first_sheet.ncols-1):
input1.append(first_sheet.col_values(row))
for row in range((first_sheet.ncols-1),first_sheet.ncols ):
output.append(first_sheet.col_values(row))
arrayInput = np.asarray(input1)
arrayInput = arrayInput.T
arrayOutput = np.asarray(output)
arrayOutput = arrayOutput.T
#testing data set
first_sheet1 = book.sheet_by_index(0)
for row in range(first_sheet1.ncols-1):
testinput.append(first_sheet1.col_values(row))
for row in range((first_sheet1.ncols-1),first_sheet1.ncols ):
testoutput.append(first_sheet1.col_values(row))
TestInput = np.asarray(testinput)
TestInput = TestInput.T
TestOutput = np.asarray(testoutput)
TestOutput = TestOutput.T
#2016
input2016=[]
first_sheet2 = book.sheet_by_index(2)
for row in range(first_sheet2.ncols):
input2016.append(first_sheet2.col_values(row))
Input = np.asarray(input2016)
Input = Input.T
# Scaling
arrayInput = arrayInput / np.amax(arrayInput, axis=0)
arrayOutput = arrayOutput / np.amax(arrayOutput, axis=0)
TestInput = TestInput / np.amax(TestInput, axis=0)
Input = Input / np.amax(Input, axis=0)
TestOutput = TestOutput / np.amax(TestOutput, axis=0)
NN=Neural_Network(Lambda=0.00000000000001)
T = trainer(NN)
T.train(arrayInput,arrayOutput,TestInput,TestOutput)
print NN.costFunctionPrime(arrayInput,arrayOutput)
Output = NN.forward(Input)
print Output
print '----------'
#print TestOutput
#plt.plot(T.J)
plt.plot(Output)
plt.grid(1)
plt.xlabel('Iterations')
plt.ylabel('cost')
plt.show()
//Turbidity means 2015 real data and prediction means data predicted using this code
Some of the comments suggest scaling the output sigmoidal layer to match the correct data. If you look at your predictions, you will see that with some scaling they are pretty accurate. I advise against scaling a sigmoidal function, however.
A sigmoidal output is meant to be interpreted as a probability (given certain constraints are followed), so scaling it would be breaking that contract and could give undefined results. What happens if you scale from 0-100, but then start receiving training targets larger than 100? (assuming you are training an online system, otherwise perhaps that example is not relevant)
I would change your code to use a linear output layer. This would not require any manipulation of the data after training the network. Also given that your cost function is least squares, the linear output layer will be convex (which reduces the number of local optima that your algorithm can get stuck in).

Categories

Resources