python tensorflow l2 loss over axis - python

I am using python 3 with tensorflow
I have a matrix, each row is a vector, I want to get a distance matrix - that is computer using the l2 norm loss, each value in the matrix will be a distance between two vectors
e.g
Dij = l2_distance(M(i,:), Mj(j,:))
Thanks
edit:
this is not a duplicate of that other question is about computing the norm for the each row of a matrix, I need the pairwise norm distance between each row to every other row.

This answer shows how to compute the pair-wise sum of squared differences between a collection of vectors. By simply post-composing with the square root, you arrive at your desired pair-wise distances:
M = tf.constant([[0, 0], [2, 2], [5, 5]], dtype=tf.float64)
r = tf.reduce_sum(M*M, 1)
r = tf.reshape(r, [-1, 1])
D2 = r - 2*tf.matmul(M, tf.transpose(M)) + tf.transpose(r)
D = tf.sqrt(D2)
with tf.Session() as sess:
print(sess.run(D))
# [[0. 2.82842712 7.07106781]
# [2.82842712 0. 4.24264069]
# [7.07106781 4.24264069 0. ]]

You can write a TensorFlow operation based on the formula of Euclidean distance (L2 loss).
distance = tf.sqrt(tf.reduce_sum(tf.square(tf.subtract(x1, x2))))
Sample would be
import tensorflow as tf
x1 = tf.constant([1, 2, 3], dtype=tf.float32)
x2 = tf.constant([4, 5, 6], dtype=tf.float32)
distance = tf.sqrt(tf.reduce_sum(tf.square(tf.subtract(x1, x2))))
with tf.Session() as sess:
print(sess.run(distance))
As pointed out by #fuglede, if you want to output the pairwise distances, then we can use
tf.sqrt(tf.square(tf.subtract(x1, x2)))

Related

Complicated vector multiplication without iterating through the vector

I'm trying to calculate a loss value in a variation of multiclass classification.
I have my y tensor (the values correspond to the classes):
y = torch.tensor([ 1, 0, 2])
My y_pred is a 3x3 matrix of probability distributions:
y_pred = torch.tensor([[0.4937, 0.2657, 0.2986],
[0.2553, 0.3845, 0.4384],
[0.2510, 0.3498, 0.2630]])
The complication is that I also have a distance matrix (each class has some distance to other classes):
d_mtx = torch.tensor([[0, 0.7256, 0.7433],
[0.6281, 0, 0.1171],
[0.7580, 0.2513, 0]])
The loss that I'm trying to calculate is:
loss = 0
for class_value in range(len(y)):
dis = torch.dot(d_mtx[y[class_value]], y_pred[class_value])
loss += dis
Is there a way to calculate it efficiently without the iteration?
Update 1:
Tried #Yahia Zakaria approach and it works if my y_pred has the same size as my d_mtx, but otherwise I get an error:
RuntimeError: The size of tensor a (3) must match the size of tensor b (4) at non-singleton dimension 0
For example:
y = torch.tensor([ 1, 0, 2, 1])
y_pred = torch.tensor([[0.4937, 0.2657, 0.2986],
[0.2553, 0.3845, 0.4384],
[0.2510, 0.3498, 0.2630],
[0.2510, 0.3498, 0.2630]])
d_mtx = torch.tensor([[0, 0.7256, 0.7433],
[0.6281, 0, 0.1171],
[0.7580, 0.2513, 0]])
You could do it like that:
loss = (d_mtx[y] * y_pred).sum()
This solution assumes the y is of type torch.int64 which is valid for the example you have shown.

Is there a vectorized way to sample multiples times with np.random.choice() with differents p?

I'm trying to implement a variation ratio, and I need T samples from an array C, but each sample has different weights p_t.
I'm using this:
import numpy as np
from scipy import stats
batch_size = 1
T = 3
C = np.array(['A', 'B', 'C'])
# p_batch_T dimensions: (batch, sample, class)
p_batch_T = np.array([[[0.01, 0.98, 0.01],
[0.3, 0.15, 0.55],
[0.85, 0.1, 0.05]]])
def variation_ratio(C, p_T):
# This function works only with one sample from the batch.
Y_T = np.array([np.random.choice(C, size=1, p=p_t) for p_t in p_T]) # vectorize this
C_mode, frecuency = stats.mode(Y_T)
T = len(Y_T)
return 1.0 - (f/T)
def variation_ratio_batch(C, p_batch_T):
return np.array([variation_ratio(C, p_T) for p_T in p_batch_T]) # and vectorize this
Is there a way to implement these functions with any for?
In stead of sampling with the given distribution p_T, we can sample uniformly between [0,1] and compare that to the cumulative distribution:
Let's start with Y_T, say for p_T = p_batch_T[0]
cum_dist = p_batch_T.cumsum(axis=-1)
idx_T = (np.random.rand(len(C),1) < cum_dist[0]).argmax(-1)
Y_T = C[idx_T[...,None]]
_, f = stats.mode(Y_T) # here axis=0 is default
Now let take that to the variation_ratio_batch:
idx_T = (np.random.rand(len(p_batch_T), len(C),1) < cum_dist).argmax(-1)
Y = C[idx_T[...,None]]
f = stats.mode(Y, axis=1) # notice axis 0 is batch
out = 1 - (f/T)
You could do it this way:
First, create a 2D weights array of shape (T, len(C)) and take the cumulative sum:
n_rows = 5
n_cols = 3
weights = np.random.rand(n_rows, n_cols)
cum_weights = (weights / weights.sum(axis=1, keepdims=True)).cumsum(axis=1)
cum_weights might look like this:
array([[0.09048919, 0.58962127, 1. ],
[0.36333997, 0.58380885, 1. ],
[0.28761923, 0.63413879, 1. ],
[0.39446498, 0.98760834, 1. ],
[0.27862476, 0.79715149, 1. ]])
Next, we can compare cum_weights to the appropriately sized output of np.random.rand. By taking argmin, we find the index in each row where the random number generated is greater than the cumulative weight:
indices = (cum_weights < np.random.rand(n_rows, 1)).argmin(axis=1)
We can then use indices to index an array of values of shape (n_cols,), which is len(C) in your original example.
np.vectorize should work:
from functools import partial
import numpy as np
#partial(np.vectorize, excluded=['rng'], signature='(),(k)->()')
def choice_batched(rng, probs):
return rng.choice(a=probs.shape[-1], p=probs)
then
num_classes = 3
batch_size = 5
alpha = .5 # Dirichlet prior hyperparameter.
rng = np.random.default_rng()
probs = np.random.dirichlet(alpha=np.full(fill_value=alpha, shape=num_classes), size=batch_size)
# Check each row sums to 1.
assert np.allclose(probs.sum(axis=-1), 1)
print(choice_batched(rng, probs))
print(choice_batched(rng, probs))
print(choice_batched(rng, probs))
print(choice_batched(rng, probs))
gives
[2 0 0 0 1]
[1 0 0 0 1]
[2 0 2 0 1]
[1 0 0 0 0]
Here is my implementation of Quang's and gmds' solutions:
def sample(ws, k):
"""Weighted sample k elements along the last axis.
ws -- Tensor of probabilities, shape (*, n)
k -- Number of elements to sample.
Returns tensor of shape (*, k) with values in {0, ..., n-1}.
"""
assert np.allclose(ws.sum(-1), 1)
cs = ws.cumsum(-1)
ps = np.random.random(ws.shape[:-1] + (k,))
return (cs[..., None, :] < ps[..., None]).sum(-1)
Say we have some stuff
>>> stuff = array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
And some weights / sampling probabilities.
>>> ws = array([[0.41296038, 0.36070229, 0.22633733],
[0.37576672, 0.14518771, 0.47904557],
[0.14742326, 0.29182459, 0.56075215]])
And we want to sample 2 elements along each row. Then we do
>>> ids = sample(ws, 2)
[[2, 0],
[1, 2],
[2, 2]]
And we can retrieve the sampled values from stuff using np.take_along_axis:
>>> np.take_along_axis(stuff, ids)
[[2, 0],
[4, 5],
[8, 8]]
The code could be generalized to sampling along an axis other than the last one, but I got confused about broadcasting, so somebody else should have a stab at it!

Compute a kernel matrix with a custom kernel function

is there a way to create something like a correlation matrix with a different function:
starting from this:
X = array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3]])
which is in the shape: (n_samples, n_features) and turn it into something like this:
array([[f(X[0],X[0]),f(X[0],X[1]),f(X[0],X[2])],
[f(X[1],X[0]), f(X[1],X[1]),f(X[1],X[2])],
[f(X[2],X[0]), f(X[2],X[1]),f(X[2],X[2])]])
thanks!
which is essentially all samples passed to a function with eachother
so the way i currenty solve it is with a nested loop:
for i in range(samples):
for j in range(samples):
r = test_kernel(X[i],X[j])
output[i,j] = r
but i doubt thats the most efficient way to do it, since i the matrix is symetrical, i have to do multiple calculations twice
As far as I understand your question, you are searching for a possibility to use some custom metric to define a kernel matrix based on a vector as input. You can use pairwise_kernels from sklearn.metrics.pairwise:
Vector as input
import numpy as np
from sklearn.metrics.pairwise import pairwise_kernels
x = np.array([[1],[2],[3]])
print('Input vector:\n', x)
kernel_default = pairwise_kernels(x)
print('Default metric - Squared euclidean norm as kernel function: \n', kernel_default)
def custom_kernel(x, y):
# Here you can define your custom transform, e.g.:
return x**3 + y**3
kernel_custom = pairwise_kernels(x, metric=custom_kernel)
print('Some custom norm, which has no meaning...:\n', kernel_custom)
results in
Input vector:
[[1]
[2]
[3]]
Default metric - Squared euclidean norm as kernel function:
[[1. 2. 3.]
[2. 4. 6.]
[3. 6. 9.]]
Some custom norm, which has no meaning...:
[[ 2. 9. 28.]
[ 9. 16. 35.]
[28. 35. 54.]]
Multi-dimensional input
import numpy as np
from sklearn.metrics.pairwise import pairwise_kernels
x = np.array([[1, 1],[2, 2],[3, 3]])
print('Input vector:\n', x)
def custom_kernel(x, y):
# Here you can define your custom transform, e.g.:
return np.sum(x)**3 + np.sum(y)**3
kernel_custom = pairwise_kernels(x, metric=custom_kernel)
print('Some custom norm, which has no meaning...:\n', kernel_custom)
results in
Input vector:
[[1 1]
[2 2]
[3 3]]
Some custom norm, which has no meaning...:
[[ 16. 72. 224.]
[ 72. 128. 280.]
[224. 280. 432.]]
array([[f(X[i],X[j]) for i in range (len(X))] for j in range (len(X))])?

how to get covariance matrix in tensorflow?

How could I get covariance matrix in tensorflow? Like numpy.cov() in numpy.
For example, I want to get covariance matrix of tensor A, now I have to use numpy instead
A = sess.run(model.A, feed)
cov = np.cov(np.transpose(A))
Is there anyway to get cov by tensorflow instead of numpy?
It is differnet from the problem how to compute covariance in tensorflow, where their problem is to compute covariance for two vector, while mine is to compute covariance matrix of a matrix(a 2D tensor) effectively using tensorflow API
This is months late but anyway posting for completeness.
import numpy as np
import tensorflow as tf
def tf_cov(x):
mean_x = tf.reduce_mean(x, axis=0, keep_dims=True)
mx = tf.matmul(tf.transpose(mean_x), mean_x)
vx = tf.matmul(tf.transpose(x), x)/tf.cast(tf.shape(x)[0], tf.float32)
cov_xx = vx - mx
return cov_xx
data = np.array([[1., 4, 2], [5, 6, 24], [15, 1, 5], [7,3,8], [9,4,7]])
with tf.Session() as sess:
print(sess.run(tf_cov(tf.constant(data, dtype=tf.float32))))
## validating with numpy solution
pc = np.cov(data.T, bias=True)
print(pc)
Answering from 2019. Tensorflow probability now supports effortless correlation and covariance.
https://www.tensorflow.org/probability/api_docs/python/tfp/stats/covariance
x = tf.random_normal(shape=(100, 2, 3))
y = tf.random_normal(shape=(100, 2, 3))
# cov[i, j] is the sample covariance between x[:, i, j] and y[:, i, j].
cov = tfp.stats.covariance(x, y, sample_axis=0, event_axis=None)
# cov_matrix[i, m, n] is the sample covariance of x[:, i, m] and y[:, i, n]
cov_matrix = tfp.stats.covariance(x, y, sample_axis=0, event_axis=-1)
Equivalent to np.cov(data):
import tensorflow as tf
import numpy as np
data = np.array([[1., 4, 2], [5, 6, 24], [15, 1, 5], [7,3,8], [9,4,7]])
def tf_conv(x):
x = x - tf.expand_dims(tf.reduce_mean(x, axis=1), 1)
fact = tf.cast(tf.shape(x)[1] - 1, tf.float32)
return tf.matmul(x, tf.conj(tf.transpose(x))) / fact
with tf.Session() as sess:
print(sess.run(tf_cov(tf.constant(data, dtype=tf.float32))))
Following up on #Souradeep Nanda, if you experiment with it you'll find that tfp.stats.covariance is only have the value (elementwise) for np.cov(..., rowvar=False), so you will have to multiply by 2 after the calculation. (This applies for v0.11.1, tested on 2x2 matrix).
For 3x3 matrix, the values are NOT EQUIVALENT so perhaps you might want to stay using np.cov. This applies too if you're not using rowvar=False for np.cov() version. I am not sure why.
We can use tfp aka tensorflow-probability to computer cov matrix:
import tensorflow-probability as tfp
x=tf.random.normal(shape=(3,3))
cov = tfp.stats.covariance(x)
## which are same as:
np_cov = np.cov(tf.transpose(x_zero),bias=True)

Compute pairwise distance in a batch without replicating tensor in Tensorflow?

I want to compute the pairwise square distance of a batch of feature in Tensorflow. I have a simple implementation using + and * operations by
tiling the original tensor :
def pairwise_l2_norm2(x, y, scope=None):
with tf.op_scope([x, y], scope, 'pairwise_l2_norm2'):
size_x = tf.shape(x)[0]
size_y = tf.shape(y)[0]
xx = tf.expand_dims(x, -1)
xx = tf.tile(xx, tf.pack([1, 1, size_y]))
yy = tf.expand_dims(y, -1)
yy = tf.tile(yy, tf.pack([1, 1, size_x]))
yy = tf.transpose(yy, perm=[2, 1, 0])
diff = tf.sub(xx, yy)
square_diff = tf.square(diff)
square_dist = tf.reduce_sum(square_diff, 1)
return square_dist
This function takes as input two matrices of size (m,d) and (n,d) and compute the squared distance between each row vector. The output is a matrix of size (m,n) with element 'd_ij = dist(x_i, y_j)'.
The problem is that I have a large batch and high dim features 'm, n, d' replicating the tensor consume a lot of memory.
I'm looking for another way to implement this without increasing the memory usage and just only store the final distance tensor. Kind of double looping the original tensor.
You can use some linear algebra to turn it into matrix ops. Note that what you need matrix D where a[i] is the ith row of your original matrix and
D[i,j] = (a[i]-a[j])(a[i]-a[j])'
You can rewrite that into
D[i,j] = r[i] - 2 a[i]a[j]' + r[j]
Where r[i] is squared norm of ith row of the original matrix.
In a system that supports standard broadcasting rules you can treat r as a column vector and write D as
D = r - 2 A A' + r'
In TensorFlow you could write this as
A = tf.constant([[1, 1], [2, 2], [3, 3]])
r = tf.reduce_sum(A*A, 1)
# turn r into column vector
r = tf.reshape(r, [-1, 1])
D = r - 2*tf.matmul(A, tf.transpose(A)) + tf.transpose(r)
sess = tf.Session()
sess.run(D)
result
array([[0, 2, 8],
[2, 0, 2],
[8, 2, 0]], dtype=int32)
Using squared_difference:
def squared_dist(A):
expanded_a = tf.expand_dims(A, 1)
expanded_b = tf.expand_dims(A, 0)
distances = tf.reduce_sum(tf.squared_difference(expanded_a, expanded_b), 2)
return distances
One thing I noticed is that this solution using tf.squared_difference gives me out of memory (OOM) for very large vectors, while the approach by #YaroslavBulatov doesn't. So, I think decomposing the operation yields a smaller memory footprint (which I thought squared_difference would handle better under the hood).
Here is a more general solution for two tensors of coordinates A and B:
def squared_dist(A, B):
assert A.shape.as_list() == B.shape.as_list()
row_norms_A = tf.reduce_sum(tf.square(A), axis=1)
row_norms_A = tf.reshape(row_norms_A, [-1, 1]) # Column vector.
row_norms_B = tf.reduce_sum(tf.square(B), axis=1)
row_norms_B = tf.reshape(row_norms_B, [1, -1]) # Row vector.
return row_norms_A - 2 * tf.matmul(A, tf.transpose(B)) + row_norms_B
Note that this is the square distance. If you want to change this to the Euclidean distance, perform a tf.sqrt on the result. If you want to do that, don't forget to add a small constant to compensate for the floating point instabilities: dist = tf.sqrt(squared_dist(A, B) + 1e-6).
If you want compute other method , then change the order of the tf modules.
def compute_euclidean_distance(x, y):
size_x = x.shape.dims[0]
size_y = y.shape.dims[0]
for i in range(size_x):
tile_one = tf.reshape(tf.tile(x[i], [size_y]), [size_y, -1])
eu_one = tf.expand_dims(tf.sqrt(tf.reduce_sum(tf.pow(tf.subtract(tile_one, y), 2), axis=1)), axis=0)
if i == 0:
d = eu_one
else:
d = tf.concat([d, eu_one], axis=0)
return d

Categories

Resources