I'm trying to calculate a loss value in a variation of multiclass classification.
I have my y tensor (the values correspond to the classes):
y = torch.tensor([ 1, 0, 2])
My y_pred is a 3x3 matrix of probability distributions:
y_pred = torch.tensor([[0.4937, 0.2657, 0.2986],
[0.2553, 0.3845, 0.4384],
[0.2510, 0.3498, 0.2630]])
The complication is that I also have a distance matrix (each class has some distance to other classes):
d_mtx = torch.tensor([[0, 0.7256, 0.7433],
[0.6281, 0, 0.1171],
[0.7580, 0.2513, 0]])
The loss that I'm trying to calculate is:
loss = 0
for class_value in range(len(y)):
dis = torch.dot(d_mtx[y[class_value]], y_pred[class_value])
loss += dis
Is there a way to calculate it efficiently without the iteration?
Update 1:
Tried #Yahia Zakaria approach and it works if my y_pred has the same size as my d_mtx, but otherwise I get an error:
RuntimeError: The size of tensor a (3) must match the size of tensor b (4) at non-singleton dimension 0
For example:
y = torch.tensor([ 1, 0, 2, 1])
y_pred = torch.tensor([[0.4937, 0.2657, 0.2986],
[0.2553, 0.3845, 0.4384],
[0.2510, 0.3498, 0.2630],
[0.2510, 0.3498, 0.2630]])
d_mtx = torch.tensor([[0, 0.7256, 0.7433],
[0.6281, 0, 0.1171],
[0.7580, 0.2513, 0]])
You could do it like that:
loss = (d_mtx[y] * y_pred).sum()
This solution assumes the y is of type torch.int64 which is valid for the example you have shown.
Related
I am trying to generate a diagonal matrix using a linear regression coefficient. First I generated an empty matrix. Then I extract the coefficient from the regression model. Here's my code:
P = np.zeros((ncol, ncol), dtype = int)
intercep = np.zeros((1, ncol), dtype = int)
my_pls = PLSRegression(n_components = ncomp, scale=False)
model = my_pls.fit(x, y)
#extract pls coeffeicient:
coef = model.coef_
intercep = model.y_mean_ - (model.x_mean_.dot(coef))
P[(i-k):(i+k), i-k] = np.diag(coef[0:ncol])
But I got zero matrices after running the code. Can anyone please help me out with how to get the diagonal matrix from the regression coefficient?
Not sure why you need to declare P.
You can get diagonal matrix with zeros directly from the 1D list/vector using numpy.diag
x=[3,5,6,7]
numpy.diag(x)
Output:
array([[3, 0, 0, 0],
[0, 5, 0, 0],
[0, 0, 6, 0],
[0, 0, 0, 7]])
For your case, try P=np.diag(coef)
I'm trying to compute the cosine similarity between 350k sentences using tensorflow.
My sentences are first vectorisd using sklearn:
doc = df['text']
vec = TfidfVectorizer(binary=False,norm='l2',use_idf=False,smooth_idf=False,lowercase=True,stop_words='english',min_df=1,max_df=1.0,max_features=None,ngram_range=(1, 1))
X = vec.fit_transform(doc)
print(X.shape)
print(type(X))
This works very well and I get sparse matrix back, I have then tried in two ways to convert my sparse matrix to a dense one.
(1) I tried this:
dense = X.toarray()
This only works with a small amount of data (around 10k sentences), but then fails on the actual computation.
(2) I have been trying to convert the output X this way, but get the same error message when doing the first step K:
K = tf.convert_to_tensor(X, dtype=None, dtype_hint=None, name=None)
Y = tf.sparse.to_dense(K, default_value=None, validate_indices=True, name=None)
Any tips/ tricks to solve this mystery would be greatly appreciated. Also happy to consider batching my computations if that should be more efficient in terms of size?
You need to make a TensorFlow sparse matrix from your SciPy one. Since your matrix seems to be in CSR format, you can do it as follows:
import numpy as np
import scipy.sparse
import tensorflow as tf
def sparse_csr_to_tf(csr_mat):
indptr = tf.constant(csr_mat.indptr, dtype=tf.int64)
elems_per_row = indptr[1:] - indptr[:-1]
i = tf.repeat(tf.range(csr_mat.shape[0], dtype=tf.int64), elems_per_row)
j = tf.constant(csr_mat.indices, dtype=tf.int64)
indices = np.stack([i, j], axis=-1)
data = tf.constant(csr_mat.data)
return tf.sparse.SparseTensor(indices, data, csr_mat.shape)
# Test
m = scipy.sparse.csr_matrix([
[0, 0, 1, 0],
[0, 0, 0, 0],
[2, 0, 3, 4],
], dtype=np.float32)
tf_mat = sparse_csr_to_tf(m)
tf.print(tf.sparse.to_dense(tf_mat))
# [[0 0 1 0]
# [0 0 0 0]
# [2 0 3 4]]
I am trying to build a classifier that has two classes(VALID and INVALID).
My ground truth target values(y_true) are y_true = [0, 1, 1, 0, 1]
Estimated targets(y_pred) as returned by the classifier are y_pred = [9.483586549758911133e-01, 7.377880215644836426e-01, 9.916032552719116211e-01, 2.021863758563995361e-01, 1.784837543964385986e-01]
how to get this y_pred values as 0 or 1? I have used a cutoff 0.5 where if any value less than 0.5 treat as 0 and rest of 1. But It's showing a very less F1 score. Without using this cutoff when I used classification_report(y_true, y_pred), it's showing almost perfect F1 score.
So, I am not getting how to get this target labely_pred values as 0 or 1?
Here you go:
>>> import numpy as np
>>> y_pred = [9.483586549758911133e-01, 7.377880215644836426e-01, 9.916032552719116211e-01, 2.021863758563995361e-01, 1.784837543964385986e-01]
>>> np.where(np.array(y_pred) >= 0.5, 1, 0)
array([1, 1, 1, 0, 0])
I'm trying to implement a max margin loss in TensorFlow.
the idea is that I have some positive example and i sample some negative examples and want to compute something like
where B is the size of my batch and N is the number of negative samples I want to use.
I'm new to tensorflow and I'm finding it tricky to implement it.
My model computes a vector of scores of dimension B * (N + 1) where I alternate positive samples and negative samples. For instance, for a batch size of 2 and 2 negative examples I have a vector of size 6 with scores for the first positive example at index 0 and for the second positive example at position 3 and scores for negative examples in position 1, 2, 4 and 5.
The ideal would be to get values like [1, 0, 0, 1, 0, 0].
What I could came up with is the following, using while and conditions:
# Function for computing max margin inner loop
def max_margin_inner(i, batch_examples_t, j, scores, loss):
idx_pos = tf.mul(i, batch_examples_t)
score_pos = tf.gather(scores, idx_pos)
idx_neg = tf.add_n([tf.mul(i, batch_examples_t), j, 1])
score_neg = tf.gather(scores, idx_neg)
loss = tf.add(loss, tf.maximum(0.0, 1.0 - score_pos + score_neg))
tf.add(j, 1)
return [i, batch_examples_t, j, scores, loss]
# Function for computing max margin outer loop
def max_margin_outer(i, batch_examples_t, scores, loss):
j = tf.constant(0)
pos_idx = tf.mul(i, batch_examples_t)
length = tf.gather(tf.shape(scores), 0)
neg_smp_t = tf.constant(num_negative_samples)
cond = lambda i, b, j, bi, lo: tf.logical_and(
tf.less(j, neg_smp_t),
tf.less(pos_idx, length))
tf.while_loop(cond, max_margin_inner, [i, batch_examples_t, j, scores, loss])
tf.add(i, 1)
return [i, batch_examples_t, scores, loss]
# compute the loss
with tf.name_scope('max_margin'):
loss = tf.Variable(0.0, name="loss")
i = tf.constant(0)
batch_examples_t = tf.constant(batch_examples)
condition = lambda i, b, bi, lo: tf.less(i, b)
max_margin = tf.while_loop(
condition,
max_margin_outer,
[i, batch_examples_t, scores, loss])
The code has two loops, one for the outer sum and the other for the inner one. The problem I'm facing is that the loss variable keeps accumulating errors at each iteration without being reset after each iteration. So it actually doesn't work at all.
Moreover, it seems really not in line with tensorflow way of implementing things. I guess there could be better ways, more vectorized ways to implement it, hope someone will suggest options or point me to examples.
First we need to clean the input:
we want an array of positive scores, of shape [B, 1]
we want a matrix of negative scores, of shape [B, N]
import tensorflow as tf
B = 2
N = 2
scores = tf.constant([0.5, 0.2, -0.1, 1., -0.5, 0.3]) # shape B * (N+1)
scores = tf.reshape(scores, [B, N+1])
scores_pos = tf.slice(scores, [0, 0], [B, 1])
scores_neg = tf.slice(scores, [0, 1], [B, N])
Now we only have to compute the matrix of the loss, i.e. all the individual loss for every pair (positive, negative), and compute its sum.
loss_matrix = tf.maximum(0., 1. - scores_pos + scores_neg) # we could also use tf.nn.relu here
loss = tf.reduce_sum(loss_matrix)
I want to compute the pairwise square distance of a batch of feature in Tensorflow. I have a simple implementation using + and * operations by
tiling the original tensor :
def pairwise_l2_norm2(x, y, scope=None):
with tf.op_scope([x, y], scope, 'pairwise_l2_norm2'):
size_x = tf.shape(x)[0]
size_y = tf.shape(y)[0]
xx = tf.expand_dims(x, -1)
xx = tf.tile(xx, tf.pack([1, 1, size_y]))
yy = tf.expand_dims(y, -1)
yy = tf.tile(yy, tf.pack([1, 1, size_x]))
yy = tf.transpose(yy, perm=[2, 1, 0])
diff = tf.sub(xx, yy)
square_diff = tf.square(diff)
square_dist = tf.reduce_sum(square_diff, 1)
return square_dist
This function takes as input two matrices of size (m,d) and (n,d) and compute the squared distance between each row vector. The output is a matrix of size (m,n) with element 'd_ij = dist(x_i, y_j)'.
The problem is that I have a large batch and high dim features 'm, n, d' replicating the tensor consume a lot of memory.
I'm looking for another way to implement this without increasing the memory usage and just only store the final distance tensor. Kind of double looping the original tensor.
You can use some linear algebra to turn it into matrix ops. Note that what you need matrix D where a[i] is the ith row of your original matrix and
D[i,j] = (a[i]-a[j])(a[i]-a[j])'
You can rewrite that into
D[i,j] = r[i] - 2 a[i]a[j]' + r[j]
Where r[i] is squared norm of ith row of the original matrix.
In a system that supports standard broadcasting rules you can treat r as a column vector and write D as
D = r - 2 A A' + r'
In TensorFlow you could write this as
A = tf.constant([[1, 1], [2, 2], [3, 3]])
r = tf.reduce_sum(A*A, 1)
# turn r into column vector
r = tf.reshape(r, [-1, 1])
D = r - 2*tf.matmul(A, tf.transpose(A)) + tf.transpose(r)
sess = tf.Session()
sess.run(D)
result
array([[0, 2, 8],
[2, 0, 2],
[8, 2, 0]], dtype=int32)
Using squared_difference:
def squared_dist(A):
expanded_a = tf.expand_dims(A, 1)
expanded_b = tf.expand_dims(A, 0)
distances = tf.reduce_sum(tf.squared_difference(expanded_a, expanded_b), 2)
return distances
One thing I noticed is that this solution using tf.squared_difference gives me out of memory (OOM) for very large vectors, while the approach by #YaroslavBulatov doesn't. So, I think decomposing the operation yields a smaller memory footprint (which I thought squared_difference would handle better under the hood).
Here is a more general solution for two tensors of coordinates A and B:
def squared_dist(A, B):
assert A.shape.as_list() == B.shape.as_list()
row_norms_A = tf.reduce_sum(tf.square(A), axis=1)
row_norms_A = tf.reshape(row_norms_A, [-1, 1]) # Column vector.
row_norms_B = tf.reduce_sum(tf.square(B), axis=1)
row_norms_B = tf.reshape(row_norms_B, [1, -1]) # Row vector.
return row_norms_A - 2 * tf.matmul(A, tf.transpose(B)) + row_norms_B
Note that this is the square distance. If you want to change this to the Euclidean distance, perform a tf.sqrt on the result. If you want to do that, don't forget to add a small constant to compensate for the floating point instabilities: dist = tf.sqrt(squared_dist(A, B) + 1e-6).
If you want compute other method , then change the order of the tf modules.
def compute_euclidean_distance(x, y):
size_x = x.shape.dims[0]
size_y = y.shape.dims[0]
for i in range(size_x):
tile_one = tf.reshape(tf.tile(x[i], [size_y]), [size_y, -1])
eu_one = tf.expand_dims(tf.sqrt(tf.reduce_sum(tf.pow(tf.subtract(tile_one, y), 2), axis=1)), axis=0)
if i == 0:
d = eu_one
else:
d = tf.concat([d, eu_one], axis=0)
return d