I have a problem that I don't know how to compute the covariance of two tensor. I have tried the contrib.metrics.streaming_covariance. But is always returns 0. There must be some errors.
You could use the definition of the covariance of two random variables X and Y with the expected values x0 and y0:
cov_xx = 1 / (N-1) * Sum_i ((x_i - x0)^2)
cov_yy = 1 / (N-1) * Sum_i ((y_i - y0)^2)
cov_xy = 1 / (N-1) * Sum_i ((x_i - x0) * (y_i - y0))
The crucial point is to estimate x0 and y0 here, since you normally do not know the probability distribution. In many cases, the mean of the x_i or y_i is estimated to be x_0 or y_0, respectively, i.e. the distribution is estimated to be uniform.
Then you can compute the elements of the covariance matrix as follows:
import tensorflow as tf
x = tf.constant([1, 4, 2, 5, 6, 24, 15], dtype=tf.float64)
y = tf.constant([8, 5, 4, 6, 2, 1, 1], dtype=tf.float64)
cov_xx = 1 / (tf.shape(x)[0] - 1) * tf.reduce_sum((x - tf.reduce_mean(x))**2)
cov_yy = 1 / (tf.shape(x)[0] - 1) * tf.reduce_sum((y - tf.reduce_mean(y))**2)
cov_xy = 1 / (tf.shape(x)[0] - 1) * tf.reduce_sum((x - tf.reduce_mean(x)) * (y - tf.reduce_mean(y)))
with tf.Session() as sess:
sess.run([cov_xx, cov_yy, cov_xy])
print(cov_xx.eval(), cov_yy.eval(), cov_xy.eval())
Of course, if you need the covariance in a matrix form, you can modify the last part as follows:
with tf.Session() as sess:
sess.run([cov_xx, cov_yy, cov_xy])
print(cov_xx.eval(), cov_yy.eval(), cov_xy.eval())
cov = tf.constant([[cov_xx.eval(), cov_xy.eval()], [cov_xy.eval(),
cov_yy.eval()]])
print(cov.eval())
To verify the elements of the TensorFlow way, you can check with numpy:
import numpy as np
x = np.array([1,4,2,5,6, 24, 15], dtype=float)
y = np.array([8,5,4,6,2,1,1], dtype=float)
pc = np.cov(x,y)
print(pc)
You can also try tensorflow probability for easy calculation of correlation or covariance.
x = tf.random_normal(shape=(100, 2, 3))
y = tf.random_normal(shape=(100, 2, 3))
# cov[i, j] is the sample covariance between x[:, i, j] and y[:, i, j].
cov = tfp.stats.covariance(x, y, sample_axis=0, event_axis=None)
# cov_matrix[i, m, n] is the sample covariance of x[:, i, m] and y[:, i, n]
cov_matrix = tfp.stats.covariance(x, y, sample_axis=0, event_axis=-1)
Related
I am trying to numerically compute in python integrals of the form
To that aim, I first define two discrete sets of x and t values, let's say
x_samples = np.linspace(-10, 10, 100)
t_samples = np.linspace(0, 1, 100)
dx = x_samples[1]-x_samples[0]
dt = t_samples[1]-t_samples[0]
declare symbolically that the function g(x,t) is equal to 0 if t<0 and discretise the two functions to integrate as
discretG = g(x_samples[None, :], t_samples[:, None])
discretH = h(x_samples[None, :], t_samples[:, None])
I have then tried to run
discretF = signal.fftconvolve(discretG, discretH, mode='full') * dx * dt
Yet, on basic test functions such as
g(x,t) = lambda x,t: np.exp(-np.abs(x))+t
h(x,t) = lambda x,t: np.exp(-np.abs(x))-t
I don't find an agreement between the the numerical integration and the convolution using scipy and I would like to have a fairly fast way of computing these integrals, especially when I only have access to discretised representations of the functions rather than their symbolic one.
According to your code, I assume you want to conduct convolution on two function g and h that are non-zero only on [a, b]*[m,n].
Of course you can use signal.fftconvolve to compute the convolution. The key is don't forget the transformation between the indices inside discretF and the real coordinates. Here I use interpolation to compute for arbitrary (x,t).
import numpy as np
from scipy import signal, interpolate
a = -1
b = 2
m = -10
n = 15
samples_num = 1000
x_eval_index = 200
t_eval_index = 300
x_samples = np.linspace(a, b, samples_num)
t_samples = np.linspace(m, n, samples_num)
dx = x_samples[1]-x_samples[0]
dt = t_samples[1]-t_samples[0]
g = lambda x,t: np.exp(-np.abs(x))+t
h = lambda x,t: np.exp(-np.abs(x))-t
discretG = g(x_samples[None, :], t_samples[:, None])
discretH = h(x_samples[None, :], t_samples[:, None])
discretF = signal.fftconvolve(discretG, discretH, mode='full')
def compute_f(x, t):
if x < 2*a or x > 2*b or t < 2*m or t > 2*n:
return 0
# use interpolation t get data on new point
x_samples_for_conv = np.linspace(2*a, 2*b, 2*samples_num-1)
t_samples_for_conv = np.linspace(2*m, 2*n, 2*samples_num-1)
f = interpolate.RectBivariateSpline(x_samples_for_conv, t_samples_for_conv, discretF.T)
return f(x, t)[0, 0] * dx * dt
Note: you can extend my codes to compute convolution on a meshgrid defined by x and y, where x and y are 1D array. (In my code, x and y are float now)
You can use the following code to explore the "agreement" between "the numerical integration" and "the convolution using scipy" (and also, the correctness of compute_f function above):
# how the convolve work
# for 1D f[i]=sigma_{j} g[j]h[i-j]
sum = 0
for y_idx, y in enumerate(x_samples[0:]):
for s_idx, s in enumerate(t_samples[0:]):
if x_eval_index - y_idx < 0 or t_eval_index - s_idx < 0:
continue
if t_eval_index - s_idx >= len(x_samples[0:]) or x_eval_index - y_idx >= len(t_samples[0:]):
continue
sum += discretG[t_eval_index - s_idx, x_eval_index - y_idx] * discretH[s_idx, y_idx] * dx * dt
print("Do discrete convolution manually, I get: %f" % sum)
print("Do discrete convolution using scipy, I get: %f" % (discretF[t_eval_index, x_eval_index] * dx * dt))
# numerical integral
# the x_val and t_val
# take 1D convolution as example, function defined on [a, b], and index of your samples range from [0, samples_num-1]
# after convolution, function defined on [2a, 2b], index of your samples range from [0, 2*samples_num-2]
dx_prime = (b-a) / (samples_num-1)
dt_prime = (n-m) / (samples_num-1)
x_eval = 2*a + x_eval_index * dx_prime
t_eval = 2*m + t_eval_index * dt_prime
sum = 0
for y in x_samples[:]:
for s in t_samples[:]:
if x_eval - y < a or x_eval - y > b:
continue
if t_eval - s < m or t_eval - s > n:
continue
if y < a or y >= b:
continue
if s < m or s >= n:
continue
sum += g(x_eval - y, t_eval - s) * h(y, s) * dx * dt
print("Do numerical integration, I get: %f" % sum)
print("The convolution result of 'compute_f' is: %f" % compute_f(x_eval, t_eval))
Which gives:
Do discrete convolution manually, I get: -154.771369
Do discrete convolution using scipy, I get: -154.771369
Do numerical integration, I get: -154.771369
The convolution result of 'compute_f' is: -154.771369
I have tried to use a toy problem of linear regression for implanting the optimisation on the MSE function using the algorithm of gradient decent.
import numpy as np
# Data points
x = np.array([1, 2, 3, 4])
y = np.array([1, 1, 2, 2])
# MSE function
f = lambda a, b: 1 / len(x) * np.sum(np.power(y - (a * x + b), 2))
# Gradient
def grad_f(v_coefficients):
a = v_coefficients[0, 0]
b = v_coefficients[1, 0]
return np.array([1 / len(x) * np.sum(2 * (y - (a * x + b)) * x),
1 / len(x) * np.sum(2 * (y - (a * x + b)))]).reshape(2, 1)
# Gradient Decent with epsilon as tol vector and alpha as the step/learning rate
def gradient_decent(v_prev):
tol = 10 ** -3
epsilon = np.array([tol * np.ones([2, 1], int)])
alpha = 0.2
v_next = v_prev - alpha * grad_f(v_prev)
if (np.abs(v_next - v_prev) <= epsilon).all():
return v_next
else:
gradient_decent(v_next)
# v_0 is the initial guess
v_0 = np.array([[1], [1]])
gradient_decent(v_0)
I have tried different alpha values but the code never converges (infinite recursion) it seems that the issue is with the stop condition of the recursion, but after few runs the v_next and v_prev bounces between -infinte to infinite
It's great that you are learning machine learning (^_^) by implementing some base algorithms by yourself. Regarding your question, there are two problems in your code, first one is mathematical, the sign in:
def grad_f(v_coefficients):
a = v_coefficients[0, 0]
b = v_coefficients[1, 0]
return np.array([1 / len(x) * np.sum(2 * (y - (a * x + b)) * x),
1 / len(x) * np.sum(2 * (y - (a * x + b)))]).reshape(2, 1)
should be
return -np.array(...)
since
the second one is programming, this kind of code will not return you a result in Python:
def add(x):
new_x = x + 1
if new_x > 10:
return new_x
else:
add(new_x)
you must use return in both clauses of the if statement, so it should be
def add(x):
new_x = x + 1
if new_x > 10:
return new_x
else:
return add(new_x)
There is also a minor issue with the alpha coefficient for these particular data points alpha=0.2 is too big for algorithm to converge, you need to use smaller alpha. I also slightly refactor your initial code using numpy broadcasting convention (https://numpy.org/doc/stable/user/basics.broadcasting.html) to get the following result:
import numpy as np
# Data points
x = np.array([1, 2, 3, 4])
y = np.array([1, 1, 2, 2])
# MSE function
f = lambda a, b: np.mean(np.power(y - (a * x + b), 2))
# Gradient
def grad_f(v_coefficients):
a = v_coefficients[0, 0]
b = v_coefficients[1, 0]
return -np.array([np.mean(2 * (y - (a * x + b)) * x),
np.mean(2 * (y - (a * x + b)))]).reshape(2, 1)
# Gradient Decent with epsilon as tol vector and alpha as the step/learning rate
def gradient_decent(v_prev):
tol = 1e-3
# epsilon = np.array([tol * np.ones([2, 1], int)]) do not need this, due to numpy broadcasting rules
alpha = 0.1
v_next = v_prev - alpha * grad_f(v_prev)
if (np.abs(v_next - v_prev) <= alpha).all():
return v_next
else:
return gradient_decent(v_next)
# v_0 is the initial guess
v_0 = np.array([[1], [1]])
gradient_decent(v_0)
import numpy as np
import pandas as pd
import numpy as np
from matplotlib import pyplot as pt
def computeCost(X,y,theta):
m=len(y)
predictions= X*theta-y
sqrerror=np.power(predictions,2)
return 1/(2*m)*np.sum(sqrerror)
def gradientDescent(X, y, theta, alpha, num_iters):
m = len(y)
jhistory = np.zeros((num_iters,1))
for i in range(num_iters):
h = X * theta
s = h - y
theta = theta - (alpha / m) * (s.T*X).T
jhistory_iter = computeCost(X, y, theta)
return theta,jhistory_iter
data = open(r'C:\Users\Coding\Desktop\machine-learning-ex1\ex1\ex1data1.txt')
data1=np.array(pd.read_csv(r'C:\Users\Coding\Desktop\machine-learning-ex1\ex1\ex1data1.txt',header=None))
y =np.array(data1[:,1])
m=len(y)
y=np.asmatrix(y.reshape(m,1))
X = np.array([data1[:,0]]).reshape(m,1)
X = np.asmatrix(np.insert(X,0,1,axis=1))
theta=np.zeros((2,1))
iterations = 1500
alpha = 0.01;
print('Testing the cost function ...')
J = computeCost(X, y, theta)
print('With theta = [0 , 0]\nCost computed = ', J)
print('Expected cost value (approx) 32.07')
theta=np.asmatrix([[-1,0],[1,2]])
J = computeCost(X, y, theta)
print('With theta = [-1 , 2]\nCost computed =', J)
print('Expected cost value (approx) 54.24')
theta,JJ = gradientDescent(X, y, theta, alpha, iterations)
print('Theta found by gradient descent:')
print(theta)
print('Expected theta values (approx)')
print(' -3.6303\n 1.1664\n')
predict1 = [1, 3.5] *theta
print(predict1*10000)
Result:
Testing the cost function ...
With theta = [0 , 0]
Cost computed = 32.072733877455676
Expected cost value (approx) 32.07
With theta = [-1 , 2]
Cost computed = 69.84811062494227
Expected cost value (approx) 54.24
Theta found by gradient descent:
[[-3.70304726 -3.64357517]
[ 1.17367146 1.16769684]]
Expected theta values (approx)
-3.6303
1.1664
[[4048.02858742 4433.63790186]]
There are two problems, the first Cost computed was right, but the second one was wrong. And there are 4 element in my gradient descent(suppose to be two)
When you mention "With theta = [-1 , 2]"
and you enter
theta=np.asmatrix([[-1,0],[1,2]])
I think this is incorrect. Assuming that you have single feature and you added a column of 1, and you are trying to do simple linear regression
The correct way should be
np.array([-1,2])
Also where have
predictions= X*theta-y
It would be better if you did
np.dot(X,theta)-y
When you multiply, it's not doing the same thing.
I am trying to solve a complicated problem.
For example, I have a batch of 2D predicted images (softmax output, value between 0 and 1) with size: Batch x H x W and ground truth Batch x H x W
The light gray color pixels are the background with value 0, and the dark gray color pixels are the foreground with value 1. I try to compute the mass center coordinates using scipy.ndimage.center_of_mass on each ground truth image. Then I get the center location point C (red color) for each ground truth. The C points set is Batch x 1.
Now, for each pixel A (yellow color) in the predicted images, I want to get three pixels B1, B2, B3 (blue color) which are the closest to A on the line AC (here C is corresponding location of mass center in ground truth).
I used following code to get the three closest points B1, B2, B3.
def connect(ends, m=3):
d0, d1 = np.abs(np.diff(ends, axis=0))[0]
if d0 > d1:
return np.c_[np.linspace(ends[0, 0], ends[1, 0], m + 1, dtype=np.int32),
np.round(np.linspace(ends[0, 1], ends[1, 1], m + 1))
.astype(np.int32)]
else:
return np.c_[np.round(np.linspace(ends[0, 0], ends[1, 0], m + 1))
.astype(np.int32),
np.linspace(ends[0, 1], ends[1, 1], m + 1, dtype=np.int32)]
So the B points set is Batch x 3 x H x W.
Then, I want to compute like this: |Value(A)-Value(B1)|+|Value(A)-Value(B2)|+|Value(A)-Value(B3)|. The size of the result should be Batch x H x W.
Is there any numpy vectorization tricks that can be used to update the value of each pixel in predicted images? Or can this be solved using pytorch functions? I need to find a method to update the whole image. The predicted image is the softmax output. I cannot use for loop to compute each single value since it will become non-differentiable. Thanks a lot.
As suggested by #Matin, you could consider Bresenham's algorithm to get your points on the AC line.
A simplistic PyTorch implementation could be as follows (directly adapted from the pseudo-code here ; could be optimized):
import torch
def get_points_from_low(x0, y0, x1, y1, num_points=3):
dx = x1 - x0
dy = y1 - y0
xi = torch.sign(dx)
yi = torch.sign(dy)
dy = dy * yi
D = 2 * dy - dx
y = y0
x = x0
points = []
for n in range(num_points):
x = x + xi
is_D_gt_0 = (D > 0).long()
y = y + is_D_gt_0 * yi
D = D + 2 * dy - is_D_gt_0 * 2 * dx
points.append(torch.stack((x, y), dim=-1))
return torch.stack(points, dim=len(x0.shape))
def get_points_from_high(x0, y0, x1, y1, num_points=3):
dx = x1 - x0
dy = y1 - y0
xi = torch.sign(dx)
yi = torch.sign(dy)
dx = dx * xi
D = 2 * dx - dy
y = y0
x = x0
points = []
for n in range(num_points):
y = y + yi
is_D_gt_0 = (D > 0).long()
x = x + is_D_gt_0 * xi
D = D + 2 * dx - is_D_gt_0 * 2 * dy
points.append(torch.stack((x, y), dim=-1))
return torch.stack(points, dim=len(x0.shape))
def get_points_from(x0, y0, x1, y1, num_points=3):
is_dy_lt_dx = (torch.abs(y1 - y0) < torch.abs(x1 - x0)).long()
is_x0_gt_x1 = (x0 > x1).long()
is_y0_gt_y1 = (y0 > y1).long()
sign = 1 - 2 * is_x0_gt_x1
x0_comp, x1_comp, y0_comp, y1_comp = x0 * sign, x1 * sign, y0 * sign, y1 * sign
points_low = get_points_from_low(x0_comp, y0_comp, x1_comp, y1_comp, num_points=num_points)
points_low *= sign.view(-1, 1, 1).expand_as(points_low)
sign = 1 - 2 * is_y0_gt_y1
x0_comp, x1_comp, y0_comp, y1_comp = x0 * sign, x1 * sign, y0 * sign, y1 * sign
points_high = get_points_from_high(x0_comp, y0_comp, x1_comp, y1_comp, num_points=num_points) * sign
points_high *= sign.view(-1, 1, 1).expand_as(points_high)
is_dy_lt_dx = is_dy_lt_dx.view(-1, 1, 1).expand(-1, num_points, 2)
points = points_low * is_dy_lt_dx + points_high * (1 - is_dy_lt_dx)
return points
# Inputs:
# (#todo: extend A to cover all points in maps):
A = torch.LongTensor([[0, 1], [8, 6]])
C = torch.LongTensor([[6, 4], [2, 3]])
num_points = 3
# Getting points between A and C:
# (#todo: what if there's less than `num_points` between A-C?)
Bs = get_points_from(A[:, 0], A[:, 1], C[:, 0], C[:, 1], num_points=num_points)
print(Bs)
# tensor([[[1, 1],
# [2, 2],
# [3, 2]],
# [[7, 6],
# [6, 5],
# [5, 5]]])
Once you have your points, you could retrieve their "values" (Value(A), Value(B1), etc.) using torch.index_select() (note that as of now, this method only accept 1D indices, so you need to unravel your data). All things put together, this would look like something such as the following (extending A from shape (Batch, 2) to (Batch, H, W, 2) is left for exercise...)
# Inputs:
# (#todo: extend A to cover all points in maps):
A = torch.LongTensor([[0, 1], [8, 6]])
C = torch.LongTensor([[6, 4], [2, 3]])
batch_size = A.shape[0]
num_points = 3
map_size = (9, 9)
map_num_elements = map_size[0] * map_size[1]
map_values = torch.stack((torch.arange(0, map_num_elements).view(*map_size),
torch.arange(0, -map_num_elements, -1).view(*map_size)))
# Getting points between A and C:
# (#todo: what if there's less than `num_points` between A-C?)
Bs = get_points_from(A[:, 0], A[:, 1], C[:, 0], C[:, 1], num_points=num_points)
# Get map values in positions A:
A_unravel = torch.arange(0, batch_size) * map_num_elements
A_unravel = A_unravel + A[:, 0] * map_size[1] + A[:, 1]
values_A = torch.index_select(map_values.view(-1), dim=0, index=A_unravel)
print(values_A)
# tensor([ 1, -4])
# Get map values in positions A:
A_unravel = torch.arange(0, batch_size) * map_num_elements
A_unravel = A_unravel + A[:, 0] * map_size[1] + A[:, 1]
values_A = torch.index_select(map_values.view(-1), dim=0, index=A_unravel)
print(values_A)
# tensor([ 1, -78])
# Get map values in positions B:
Bs_flatten = Bs.view(-1, 2)
Bs_unravel = (torch.arange(0, batch_size)
.unsqueeze(1)
.repeat(1, num_points)
.view(num_points * batch_size) * map_num_elements)
Bs_unravel = Bs_unravel + Bs_flatten[:, 0] * map_size[1] + Bs_flatten[:, 1]
values_B = torch.index_select(map_values.view(-1), dim=0, index=Bs_unravel)
values_B = values_B.view(batch_size, num_points)
print(values_B)
# tensor([[ 10, 20, 29],
# [-69, -59, -50]])
# Compute result:
res = torch.abs(values_A.unsqueeze(-1).expand_as(values_B) - values_B)
print(res)
# tensor([[ 9, 19, 28],
# [ 9, 19, 28]])
res = torch.sum(res, dim=1)
print(res)
# tensor([56, 56])
I want to define some function on a matrix X. For example mean(pow(X - X0, 2)), where X0 is another matrix (X0 is fixed / constant). To make it more specific, let's assume that both X and X0 are 10 x 10 matrices. The result of the operation is a real number.
Now I have a big matrix (let's say 500 x 500). I want to apply the operation defined above to all 10 x 10 sub-matrices of the "big" matrix. In other words, I want to slide the 10 x 10 window over the "big" matrix. For each location of the window, I should get a real number. So, as a final result, I need to get a real-valued matrix (or 2D tensor) (its shape should be 491 x 491).
What I want to have is close to a convolutional layer but not exactly the same because I want to use a mean squared deviation instead of a linear function represented by a neuron.
This is only a Numpy solution, hope it suffices.
I assume that your function is made up of an operation on the matrix elements and of a mean, i.e. a scaled sum. Hence, it is sufficient to look at Y as in
Y = np.power(X-X0, 2)
So we only need to deal with determining a windowed mean. Note that for the 1D case the matrix product with an appropriate vector of ones can be determined for calculating the mean, e.g.
h = np.array([0, 1, 1, 0]) # same dimension as y
m1 = np.dot(h, y) / 2
m2 = (y[1] + y[2]) / 2
print(m1 == m2) # True
The 2D case is analogous, but with two matrix multiplications, one for the rows and one for the columns. E.g.
m_2 = np.dot(np.dot(h, Y), h) / 2**2
To construct a sliding window, we need to build a matrix of shifted windows, e.g.
H = [[1, 1, 1, 0, 0, ..., 0],
[0, 1, 1, 1, 0, ..., 0],
.
.
.
[0, ..., 0, 0, 1, 1, 1]]
to calculate all the sums
S = np.dot(np.dot(H, Y), H.T)
A full example for a (n, n) matrix with a (m, m) window would be
import numpy as np
n, m = 500, 10
X0 = np.ones((n, n))
X = np.random.rand(n, n)
Y = np.power(X-X0, 2)
h = np.concatenate((np.ones(m), np.zeros(n-m))) # window at position 0
H = np.vstack((np.roll(h, k) for k in range(n+1-m))) # slide the window
M = np.dot(np.dot(H,Y), H.T) / m**2 # calculate the mean
print(M.shape) # (491, 491)
An alternative but probably slightly less efficient way for building H is
H = np.sum(np.diag(np.ones(n-k), k)[:-m+1, :] for k in range(m))
Update
Calculating the mean squared deviation is also possible with that approach. For that, we generalize the vector identity |x-x0|^2 = (x-x0).T (x-x0) = x.T x - 2 x0.T x + x0.T x0 (a space denotes a scalar or matrix multiplication and .T a transposed vector) to the matrix case:
We assume W is a (m,n) matrix containing a block (m.m) identity matrix, which is able to extract the (k0,k1)-th (m,m) sub-matrix by Y = W Z W.T, where Z is the (n,n) matrix containing the data. Calculating the difference
D = Y - X0 = Y = W Z W.T - X0
is straightforward, where X0 and D is a (m,m) matrix. The square-root of the squared sum of the elements is called Frobenius norm. Based on those identities, we can write the squared sum as
s = sum_{i,j} D_{i,j}^2 = trace(D.T D) = trace((W Z W.T - X0).T (H Z H.T - X0))
= trace(W Z.T W.T W Z W.T) - 2 trace(X0.T W Z W.T) + trace(X0.T X0)
=: Y0 + Y1 + Y2
The term Y0 can be interpreted as H Z H.T from the method from above.The term Y1 can be interpreted as a weighted mean on Z and Y2 is a constant, which only needs to be determined once.
Thus, a possible implementation would be:
import numpy as np
n, m = 500, 10
x0 = np.ones(m)
Z = np.random.rand(n, n)
Y0 = Z**2
h0 = np.concatenate((np.ones(m), np.zeros(n-m)))
H0 = np.vstack((np.roll(h0, k) for k in range(n+1-m)))
M0 = np.dot(np.dot(H0, Y0), H0.T)
h1 = np.concatenate((-2*x0, np.zeros(n-m)))
H1 = np.vstack((np.roll(h1, k) for k in range(n+1-m)))
M1 = np.dot(np.dot(H1, Z), H0.T)
Y2 = np.dot(x0, x0)
M = (M0 + M1) / m**2 + Y2