Tensorflow: Gradient Calculation from Input to Output

Tensorflow: Gradient Calculation from Input to Output - python

I would like to calculate the gradients of the output of a neural network with respect to the input. I have the following tensors:
Input: (num_timesteps, features)
Output: (num_timesteps, 1)
For the gradients from the inputs to the entire output vector I can use the following:
tf.gradients(Output, Input)
Since I would like to compute the gradients for every single timesample I would like to calculate
tf.gradients(Output[i], Input)
for every i.
What is the best way to do that?

First up, I suppose you mean the gradient of Output with respect to the Input.
Now, the result of both of these calls:
dO = tf.gradients(Output, Input)
dO_i = tf.gradients(Output[i], Input) (for any valid i)
will be a list with a single element - a tensor with the same shape as Input, namely a [num_timesteps, features] matrix. Also, if you sum all matrices dO_i (over all valid i) is exactly the matrix dO.
With this in mind, back to your question. In many cases, individual rows from the Input are independent, meaning that Output[i] is calculated only from Input[i] and doesn't know other inputs (typical case: batch processing without batchnorm). If that is your case, then dO is going to give you all individual components dO_i at once.
This is because each dO_i matrix is going to look like this:
[[ 0. 0. 0.]
[ 0. 0. 0.]
...
[ 0. 0. 0.]
[ xxx xxx xxx] <- i-th row
[ 0. 0. 0.]
...
[ 0. 0. 0.]]
All rows are going to be 0, except for the i-th one. So just by computing one matrix dO, you can easily get every dO_i. This is very efficient.
However, if that's not your case and all Output[i] depend on all inputs, there's no way to extract individual dO_i just from their sum. You have no other choice other than calculate each gradient separately: just iterate over i and execute tf.gradients.

Related

Different results using affinity propagation "precomputed" distance matrix

I am working with two-dimensional data
X = array([[5.40310335, 0. ],
[6.86136114, 6.56225717],
[0. , 0. ],
...,
[5.88838732, 0. ],
[6.0003473 , 0. ],
[6.25971331, 0. ]])
looking for clusters, using euclidean distance, i run affinity propagation from scikit learn with this raw data as follows
af = AffinityPropagation(damping=.9, max_iter=300, random_state=0).fit(X)
obtaining as a result 9 clusters.
I understand that when you want to use another distance you have to enter the negative distance matrix, and use affintity = 'precomputed' as it follows
af_c = AffinityPropagation(damping=.9, max_iter=300,
affinity='precomputed', random_state=0).fit(distM)
if as distM I use the Euclidean distance matrix calculated as follows
distM_E = -distance_matrix(X,X)
np.fill_diagonal(distM, np.median(distM))
completing the diagonal with the median since it is a predefined preference value also in the method.
Using this I am getting 34 clusters as a result and I would expect to have 9 as if working with the default distance. I don't know if I'm interpreting the way of entering the distance matrix correctly or if the library does something different when one uses everything predefined.
I would appreciate any help.

difference between np.zeros((1, n)) and np.zeros(n)

I am trying to understand the difference between np.zeros((1, n)) and np.zeros(n)
row_vector = np.zeros((1, n))
vector = np.zeros(n)
print('Shape of row_vector: {0}'.format(row_vector.shape))
print('Shape of vector: {0}'.format(vector.shape))
Output is without one extra bracket
Contents of row_vector:
[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]
Contents of vector - Note the number of brackets compared to row_vector:
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
Secondly, if I have to add them, how to do that?

It simply boils down to the number of dimensions of the array or tensor. The row_vector is a 2 dimensional array, while vector is a 1 dimensional array.
You can easily verify this by calling the ndim attribute.
e.g.
>>> row_vector.ndim()
2
>>> vector.ndim()
1
This additional dimension is very useful when working with tensor focused libraries such as TensorFlow and PyTorch.

The difference is that one is a 1D array, the other 2D. You can see this by printing the x.ndim attribute or len(x.shape).
That being said, for many practical purposes there is no difference. Broadcasting aligns dimensions on the right edge, so you can add the two arrays directly using the +operator:
s = vector + row_vector
The result will be 2D, with shape (1, n).
There would be a big difference if the shapes were (n,) and (n, 1). Broadcasting would again align the shapes on the right, which would make the sum an outer sum of shape (n, n).
You can perform an outer sum on the existing arrays by accessing a view with a unit axis using np.newaxis:
vector[:, np.newaxis] + row_vector
Or
vector + row_vector[:, np.newaxis]

Cosine similarity of list of values with each other

I am trying to find the cosine similarity of a list of strings. I used sklearn tfidf vector to convert the text into a numerical vector first and then used the pairwise cosine_similarity api to find the score for each string pair.
The strings seem similar, but I am getting a weird answer. The first and third value in the string array are similar except the word TRENTON, but the cosine similarity is 0. Similarly, the 1st,3rd and 4th string are the same, except for a space between GREEN and CHILLI and the cosine similarity is zero. Isn't that strange?
My code:
from sklearn.metrics import pairwise_kernels
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import TfidfVectorizer
tfidf_vectorizer=TfidfVectorizer()
values =['GREENCHILLI TRENTON'
,'GREENCHILLI'
,'GREEN CHILLI'
,'GREEN CHILLI']
X_train_counts = tfidf_vectorizer.fit_transform(values)
similarities = cosine_similarity(X_train_counts)
print(similarities)
Output
[[1. 0.6191303 0. 0. ]
[0.6191303 1. 0. 0. ]
[0. 0. 1. 1. ]
[0. 0. 1. 1. ]]

coma (,) missing between last two GREEN CHILLI so tfidf is treating them as only 3 records not 4.
If you correct it you should see below cosine similarity
[[1. 0.6191303 0. 0. ]
[0.6191303 1. 0. 0. ]
[0. 0. 1. 1. ]
[0. 0. 1. 1. ]]
How to interpret the above matrix: The value in the nth row are cosine similarities of that tfidf vector with all other vectors (in sequential order). So all the diagonal will be 1 because every vector is similar to itself.

The first and third value in the string array values is similar except the word Trenton but cosine similarity is 0.
Similarly, 1st,3rd and 4th strings are same only space between GREEN and CHILLI and the cosine similarity is zero. isn't it strange?
It is not as strange as you might think. You will only get a non-zero cosine similarity if you have exact word matches between the strings that you compare. I will try to explain what happens:
When the TF-IDF vectorizer creates vectors from your list of strings, it starts by making a list of all words that occur.
So in your case, the list would look like this:
GREENCHILLI
TRENTON
GREEN
CHILLI
Now, every word becomes an axis in a coordinate system that the algorithm uses. All axes are perpendicular to each other.
So when you compare 'GREENCHILLI TRENTON' with 'GREEN CHILLI', the algorithm makes two vectors. One from 'GREENCHILLI TRENTON' that has a component parallel to 'GREENCHILLI' and a component parallel to 'TRENTON'. The vector from the string 'GREEN CHILI' has components in 'GREEN' and 'CHILLI' direction of your coordinate system. When you calculate the dot product between the two you will get a zero. So the cosine similarity is zero as well.
So the gap in 'GREEN CHILLI' makes all the difference, when you compare it to 'GREENCHILLI'. The letters don't matter anymore, once the vectorizer made its coordinate system based on all the words it found in your list, because it identifies 'GREENCHILLI', 'GREEN' and 'CHILLI' as different words and makes them into perpendicular axes in its reference coordinate system.
Hope that makes it more clear. I suggest reading the following article series for a more in-depth understanding of whats going on:
http://blog.christianperone.com/2011/09/machine-learning-text-feature-extraction-tf-idf-part-i/

How to plot a weighted graph of k-Neighbors in Python

I have created a weighted graph of k-Neighbors using scikit-learn, I'm wondering if there is any way to plot it as a graph.
Here is the result of computation in form of array which I want to plot:
array([[0. , 2.08243189, 0. , 3.42661108],
[2.08243189, 0. , 3.27141008, 0. ],
[0. , 3.27141008, 0. , 1.57294787],
[0. , 3.29779083, 1.57294787, 0. ]])
I just need to get some visualization of data, that's all I need.
More details about the array:
Each row represents a node and each column represents the weight of connectivity of that node with the other nodes.
For example: second column of first row (2.08243189) is the weight of connectivity from first node to second node.
Another example: second row, second column (0): the weight of connectivity from node 2 to itself.
The numbers represents euclidean distance.

Are you talking about something simple like this where the size of the point gives a visual indication of the relative weight compared to the other values? Assume the array is named ar:
for i in range(len(ar)):
for j in range(len(ar)):
v = ar[i,j]
plt.scatter(i+1,j+1,lw=0,s=10**v)
plt.grid(True)
plt.xlabel('Row')
plt.ylabel('Column')
ticks = list(range(1,1+len(ar)))
plt.xticks(ticks)
plt.yticks(ticks)

Dendrogram through scipy given a similarity matrix

I have computed a jaccard similarity matrix with Python. I want to cluster highest similarities to lowest, however, no matter what linkage function I use it produces the same dendrogram! I have a feeling that the function assumes that my matrix is of original data, but I have already computed the first similarity matrix. Is there any way to pass this similarity matrix through to the dendrogram so it plots correctly? Or am I going to have to output the matrix and simply do it with R. Passing through the original raw data is not possible, as I am computing similarities of words. Thanks for the help!
Here is some code:
SimMatrix = [[ 0.,0.09259259, 0.125 , 0. , 0.08571429],
[ 0.09259259, 0. , 0.05555556, 0. , 0.05128205],
[ 0.125 , 0.05555556, 0. , 0.03571429, 0.05882353],
[ 0. , 0. , 0.03571429, 0. , 0. ],
[ 0.08571429, 0.05128205, 0.05882353, 0. , 0. ]]
linkage = hcluster.complete(SimMatrix) #doesnt matter what linkage...
dendro = hcluster.dendrogram(linkage) #same plot for all types?
show()
If you run this code, you will see a dendrogram that is completely backwards. No matter what linkage type I use, it produces the same dendrogram. This intuitively can not be correct!

Here's the solution. Turns out the SimMatrix needs to be first converted into a condensed matrix (the diagonal, upper right or bottom left, of this matrix).
You can see this in the code below:
import scipy.spatial.distance as ssd
distVec = ssd.squareform(SimMatrix)
linkage = hcluster.linkage(1 - distVec)
dendro = hcluster.dendrogram(linkage)
show()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.