How to count elements in tensorflow tensor? - python

I have a tensor for example : X = [1, 1, 0, 0, 1, 2, 2, 0, 1, 2].
And what I want is to reduce this tensor X to a tensor such as: Y = [3, 4, 3].
Where Y in position 0 is the count of how many 0s there are in X, and the position 1 how many 1s, so on and so forth.
What I'm doing right now is iterating through this tensor using the tf.where function. But this doesn`t seem elegant, and there must be a better way to do it.
Thanks.

You are looking for tf.unique_with_counts.
import tensorflow as tf
X = tf.constant([1, 1, 0, 0, 1, 2, 2, 0, 1, 2])
op = tf.unique_with_counts(X)
sess = tf.InteractiveSession()
res = sess.run(op)
print(res.count)
# [4 3 3]
Beware that tf.bincount only handle positive integers. If your input tensor is not of integer type, or contains negative values, you must use tf.unique_with_count. Otherwise bincount is fine and to the point.

I think you are looking for Y = tf.bincount(X):
X = tf.constant([1, 1, 0, 0, 1, 2, 2, 0, 1, 2])
Y = tf.bincount(X)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
Y.eval()
# output
#[3, 4, 3]
For negative integers you can use:
tf.bincount(X + tf.abs(tf.reduce_min(X)) )

Related

PyTorch: How to create sample weights matrix from a tensor of number of frames

this has probably been answered before but I could not come up with the appropriate search query to find the previous answers so apologies if you've answered this before.
Let's say I have a batch size of 4 and a 1D tensor specifying the "unpadded" lengths of my input features input_lengths=[4, 6, 8, 10].
My feature tensor will be of shape (4, 10, C) for C dimensional features at each timestep. I want to create a sample weights matrix of shape (4, 10) which is for each datum in the batch, filled with ones up to the "unpadded" length of that datum. So, for the case above,
sample_weights = [[1, 1, 1, 1, 0, 0, 0, 0, 0, 0], [1, 1, 1, 1, 1, 1, 0, 0, 0, 0], [1, 1, 1, 1, 1, 1, 1, 1, 0, 0], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]
I don't want to have a for-loop and was wondering if there's a more efficient way of creating this sample_weights matrix using torch.Tensor functions.
Thanks
Here's a benchmark between both approaches.
import torch
from torch.nn.utils.rnn import pad_sequence
def build_w_vec(inputs):
rows = len(inputs)
cols = sorted(inputs)[-1]
w = torch.zeros(rows, cols)
# start and end steps are incremented by `1` to avoid null indexing when filling `w`
aranges_list = list(map(lambda l: torch.arange(start=1, end=l + 1), inputs))
aranges_tensor = pad_sequence(aranges_list, batch_first=True)
w[aranges_tensor != 0] = 1
return w
def build_w_iter(inputs):
rows = len(inputs)
cols = sorted(inputs)[-1]
w = torch.zeros(rows, cols)
for i, length in enumerate(inputs):
arange = torch.arange(length)
w[i, arange] = 1
return w

cosine similarity between a vector and pandas column(a linear vector)

I have a pandas data frame containing list of wines with their respective wine attributes.
Then I made a new column vector that contains numpy vectors from these attributes.
def get_wine_profile(id):
wine = wines[wines['exclusiviId'] == id]
wine_vector = np.array(wine[wine_attrs].values.tolist()).flatten()
return wine_vector
wines['vector'] = wines.exclusiviId.apply(get_wine_profile)
hence the vector column look something like this
vector
[1, 1, 1, 2, 2, 2, 2, 1, 1, 1]
[3, 1, 2, 1, 2, 2, 2, 0, 1, 3]
[1, 1, 2, 1, 3, 3, 3, 0, 1, 1]
.
.
now I want to perform cosine similarity between this column and another vector that is resulting vector from the user input
This is what i have tried so far
from scipy.spatial.distance import cosine
cos_vec = wines.apply(lambda x: (1-cosine(wines["vector"],[1, 1, 1, 2, 2, 2, 2, 1, 1, 1]), axis=1)
Print(cos_vec)
this is throwing error
ValueError: ('operands could not be broadcast together with shapes (63,) (10,) ', 'occurred at index 0')
I also tries using sklearn but it also have the same problem with the arrar shape
what i want as a final output is a column that has match score between this column and user input
A better solution IMO is to use cdist with cosine metric. You are effectively computing pairwise distances between n points in your DataFrame and 1 point in your user input, i.e. n pairs in total.
If you handle more than one user at a time, this would be even more efficient.
from scipy.spatial.distance import cdist
# make into 1x10 array
user_input = np.array([1, 1, 1, 2, 2, 2, 2, 1, 1, 1])[None]
df["cos_dist"] = cdist(np.stack(df.vector), user_input, metric="cosine")
# vector cos_dist
# 0 [1, 1, 1, 2, 2, 2, 2, 1, 1, 1] 0.00000
# 1 [3, 1, 2, 1, 2, 2, 2, 0, 1, 3] 0.15880
# 2 [1, 1, 2, 1, 3, 3, 3, 0, 1, 1] 0.07613
By the way, it looks like you are using native Python lists. I would switch everything to numpy arrays. A conversion to np.array is happening under the hood anyway when you call cosine.
well i made my own function to do this and yes it works
import math
def cosine_similarity(v1,v2):
"compute cosine similarity of v1 to v2: (v1 dot v2)/{||v1||*||v2||)"
sumxx, sumxy, sumyy = 0, 0, 0
for i in range(len(v1)):
x = v1[i]; y = v2[i]
sumxx += x*x
sumyy += y*y
sumxy += x*y
return sumxy/math.sqrt(sumxx*sumyy)
def get_similarity(id):
vec1 = result_vector
vec2 = get_wine_profile(id)
similarity = cosine_similarity(vec1, vec2)
return similarity
wines['score'] = wines.exclusiviId.apply(get_similarity)
display(wines.head())

Tensorflow compute multiplication by binary matrix

I have my data tensor which is of the shape [batch_size,512] and I have a constant matrix with values only of 0 and 1 which has the shape [256,512].
I would like to compute efficiently for each batch the sum of the products of my vector (second dimension of the data tensor) only for the entries which are 1 and not 0.
An explaining example:
let us say I have 1-sized batch: the data tensor has the values [5,4,3,7,8,2] and my constant matrix has the values:
[0,1,1,0,0,0]
[1,0,0,0,0,0]
[1,1,1,0,0,1]
it means that I would like to compute for the first row 4*3, for the second 5 and for the third 5*4*3*2.
and in total for this batch, I get 4*3+5+5*4*3*2 which equals to 137.
Currently, I do it by iterating over all the rows, compute elementwise the product of my data and constant-matrix-row and then sum, which runs pretty slow.
How about something like this:
import tensorflow as tf
# Two-element batch
data = [[5, 4, 3, 7, 8, 2],
[4, 2, 6, 1, 6, 8]]
mask = [[0, 1, 1, 0, 0, 0],
[1, 0, 0, 0, 0, 0],
[1, 1, 1, 0, 0, 1]]
with tf.Graph().as_default(), tf.Session() as sess:
# Data as tensors
d = tf.constant(data, dtype=tf.int32)
m = tf.constant(mask, dtype=tf.int32)
# Tile data as needed
dd = tf.tile(d[:, tf.newaxis], (1, tf.shape(m)[0], 1))
mm = tf.tile(m[tf.newaxis, :], (tf.shape(d)[0], 1, 1))
# Replace values with 1 wherever the mask is 0
w = tf.where(tf.cast(mm, tf.bool), dd, tf.ones_like(dd))
# Multiply row-wise and sum
result = tf.reduce_sum(tf.reduce_prod(w, axis=-1), axis=-1)
print(sess.run(result))
Output:
[137 400]
EDIT:
If you input data is a single vector then you would just have:
import tensorflow as tf
# Two-element batch
data = [5, 4, 3, 7, 8, 2]
mask = [[0, 1, 1, 0, 0, 0],
[1, 0, 0, 0, 0, 0],
[1, 1, 1, 0, 0, 1]]
with tf.Graph().as_default(), tf.Session() as sess:
# Data as tensors
d = tf.constant(data, dtype=tf.int32)
m = tf.constant(mask, dtype=tf.int32)
# Tile data as needed
dd = tf.tile(d[tf.newaxis], (tf.shape(m)[0], 1))
# Replace values with 1 wherever the mask is 0
w = tf.where(tf.cast(m, tf.bool), dd, tf.ones_like(dd))
# Multiply row-wise and sum
result = tf.reduce_sum(tf.reduce_prod(w, axis=-1), axis=-1)
print(sess.run(result))
Output:
137

TensorFlow trim values in tensor

How do I perform the following in a TensorFlow tensor?
In matrix A: if A[i,j] > 1 then A[i,j] = 1
(in numpy I would do this: A[A>1] = 1)
You can use tf.minimum, which does element-wise minimum calculation; By setting y = 1, values in x will be clipped with the maximum of 1:
A = tf.constant([-1, 0, 1, 3, 4])
A_clipped = tf.minimum(A, 1)
sess = tf.InteractiveSession()
A_clipped.eval()
# array([-1, 0, 1, 1, 1], dtype=int32)
Another option is use tf.where to set values:
tf.where(A > 1, tf.constant(1, shape=A.shape), A).eval()
# array([-1, 0, 1, 1, 1], dtype=int32)
If you need to update Variable A:
A = tf.Variable([-1, 0, 1, 3, 4])
​
tf.global_variables_initializer().run()
tf.assign(A, tf.minimum(A, 1)).eval()
A.eval()
# array([-1, 0, 1, 1, 1], dtype=int32)

numpy indexing: add vector to parts of rows, starting at varying position

I have this 2d array of zeros z and this 1d array of starting points starts. In addition, I have an 1d array of offsets
z = z = np.zeros(35, dtype='i').reshape(5, 7)
starts = np.array([1, 5, 3, 0, 3])
offsets = np.arange(5) + 1
I would like to vectorize this little for loop here, but I seem to be unable to do it.
for i in range(z.shape[0]):
z[i, starts[i]:] += offsets[i]
The result in this example should look like this:
z
array([[0, 1, 1, 1, 1, 1, 1],
[0, 0, 0, 0, 0, 2, 2],
[0, 0, 0, 3, 3, 3, 3],
[4, 4, 4, 4, 4, 4, 4],
[0, 0, 0, 5, 5, 5, 5]])
We could use some masking and NumPy broadcasting -
mask = starts[:,None] <= np.arange(z.shape[1])
z[mask] = np.repeat(offsets, mask.sum(1))
We could play a trick of broadcasted multiplication to get the final output -
z = offsets[:,None] * mask
Other way would be to assign values into z from offsets and then mask out the rest of mask, like so -
z[:] = offsets[:,None]
z[~mask] = 0
And other way would be have a replicated version from offsets as the starting z and then mask out -
z = np.repeat(offsets,z.shape[1]).reshape(z.shape[0],-1)
z[~mask] = 0
Of course, we would need the shape parameters before-hand.
If z is not initialized as zeros array, then only one of the solutions mentioned earlier would be applicable and that would need to be updated with +=, like so -
z[mask] += np.repeat(offsets, mask.sum(1))

Categories

Resources