Get one hot vector in Keras embedding layer - python

Let's assume I have a number x.
This x can have some finite categorical values, by using the Embedding layer of Keras, this number x becomes a vector of embedding dimension.
Assume:
E embedded vector.
o one-hot encoded vector of x.
W embedding matrix.
Then to make x to be a vector, we do E = Wo
Now, in order to retrieve E from Keras, we can execute the following code:
K.function([model.layers[0].input], [model.get_layer('embedding').output])([input_data])
While to retrieve W from Keras, we can execute:
model.get_layer('embedding').get_weights()
However, I have no idea how can I access vector o (since Keras automatically calculates it), which is one hot encoded version of x. Since I do not have any info about how the one-hot encoding is performed (precisely, the information regarding what value is mapped to which component), it does not seem to be that trivial to do it in another way.
Any answer to the question?

Related

What do W and U notate in a GRU?

I'm trying to figure out how to backpropagate a GRU Recurrent network, but I'm having trouble understanding the GRU architecture precisely.
The image below shows a GRU cell with 3 neural networks, receiving the concatenated previous hidden state and the input vector as its input.
GRU example
This image used I referenced for backpropagation, however, shows the inputs being forwarded into W and U for each of the gates, added, and then having their appropriate activation functions applied.
GRU Backpropagation
the equation for the update gate shown on wikipedia is as shown here as an example
zt = sigmoid((W(z)xt + U(z)ht-1))
can somebody explain to me what W and U represent?
EDIT:
in most of the sources I found, W and U are usually referred to as "weights", so my best guess is that W and U represent their own neural networks, but this would contradict the image I found before.
if somebody could give an example of how W and U would work in a simple GRU, that would be helpful.
Sources for the images:
https://cran.r-project.org/web/packages/rnn/vignettes/GRU_units.html
https://towardsdatascience.com/animated-rnn-lstm-and-gru-ef124d06cf45
W and U are matrices whose values are learnt during training (a.k.a. neural network weights). The matrix W multiplies the vector xt and produces a new vector. Similarly, the matrix U multiplies the vector ht-1 and produces a new vector. Those two new vectors are added together and then each component of the result is passed to the sigmoid function.

Which data format does keras model.fit function need?

I'm would like to know which data format the model.fit function of keras needs. The documentation is not specific enough for me.
So it seems, that for an LSTM model it needs a 3D array for the parameter x.
Some more specific questions:
Does the data format depend on the chosen model?
What is the meaning of each dimension of x?
And what is the meaning of y?
Thanks in advance for anybody who can tell me a bit about that!
Holger
The data format certainly depends on the model. You can have models that have multiple inputs, such as Siamese networks.
In the case of an LSTM, I believe the input is 2-D as in this example. That example loads data from the IMDB dataset. The relevant line of code there is:
xs = [[oov_char if (w >= num_words or w < skip_top) else w for w in x] for x in xs]
The first dimension corresponds to different examples, and the second dimension is the timestep.
As for y, that refers to the labels. In a sequence to sequence example, this would also be two dimensional with the same [example_index, timestep] indexing. However, in classification it be 1-dimensional with one label for each example.

How to pass 3d Tensor to tensorflow RNN embedding_rnn_seq2seq

I'm trying to feed sentences in which each world has word2vec representation.
How can I do it in tensorflow seq2seq models?
Suppose the variable
enc_inp = [tf.placeholder(tf.int32, shape=(None,10), name="inp%i" % t)
for t in range(seq_length)]
Which has dimensions [num_of_observations or batch_size x word_vec_representation x sentense_lenght].
when I pass it to embedding_rnn_seq2seq
decode_outputs, decode_state = seq2seq.embedding_rnn_seq2seq(
enc_inp, dec_inp, stacked_lstm,
seq_length, seq_length, embedding_dim)
error occurs
ValueError: Linear is expecting 2D arguments: [[None, 10, 50], [None, 50]]
Also there is a more complex problem
How can i pas as input a vector, not a scalar to first cell of my RNN?
By now it looks like (when we are about any sequence)
get first value of sequence (scalar)
compute First layer RNN First layer embedding cell output
compute First layer RNN Second layer embedding cell output
etc
But this is needed:
Get first value of sequence (vector)
compute First layer RNN First layer cell output (as ordinary computing simple perceptron when Input is a vector)
compute First layer RNN Second layer embedding cell output (as ordinary computing simple perceptron when Input is a vector)
The main point is that:
seq2seq make inside themself word embedding.
Here is reddit question and answer
Also, if smbd wants to use pretrained Word2Vec there are ways to do it,
see:
stackoverflow 1
stackoverflow 2
So this can be used no only for word embedding

Creating dynamic matrix for every element in a batch in TensorFlow

I am working on a siamese CNN with attention in TensorFlow.
The CNN structure consists on a embedding lookup table shared by two CNN sharing weights.
The inputs for the network are two matrices, both containing indices for question and answer to be fed into the network (batch_size x sentence_length):
self.input_q = tf.placeholder(tf.int32, [None, sentence_length], name="input_q")
self.input_a = tf.placeholder(tf.int32, [None, sentence_length], name="input_a")
After embedding each sentence (row from the input matrix) I end up with two tensors (questions and answer) each of them of size: batch_size x sentence_lentgh x embedding_size.
Let's forget for now about the batch dimension to make things easier. This is to say, we have two matrices Qemb and Aemb, both sentence_lentgh x embedding_size.
From this two matrices I would like to construct a third one, an attention matrix A used for a posterior learnable attention feature matrix , using numpy would be defined as follows:
A[i,j] = 1.0 / (1.0 + np.linalg.norm(Qemb[i,:]-Aemb[j,:]))
This matrix is built for each input pair, so should be a part of the graph, but apparently this cannot be done in TensorFlow as there's no asingn operation by index for a Tensor.
Am I right?
I thought I could run the ops for embedding the question and answer, build the A matrix outside the graph given the computedembeddings and then feed the A matrix back to the graph to continue the next operations based on it.
self.attention_matrix = \
tf.placeholder(tf.float32,
[None, sentence_length, sentence_length],
name = "Attention_matrix")
Is there any problem with this approach that I might not be aware of?
(Appart from runing the embeddings ops twice, what doesn't seem optimal, but not a big deal)

TensorFlow Multi-Layer Perceptron

I am learning TensorFlow, and my goal is to implement MultiPerceptron for my needs. I checked the MNIST tutorial with MultiPerceptron implementation and everything was clear to me except this:
_, c = sess.run([optimizer, cost], feed_dict={x: batch_x,
y: batch_y})
I guess, x is an image itself(28*28 pixels, so the input is 784 neurons) and y is a label which is an 1x10 array:
x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None, n_classes])
They feed whole batches (which are packs of data points and labels)! How does tensorflow interpret this "batch" input? And how does it update the weights: simultaneously after each element in a batch, or after running through the whole batch?
And, if I need to input one number (input_shape = [1,1]) and output four numbers (output_shape = [1,4]), how should I change the tf.placeholders and in which form should I feed them into session?
When I ask, how does tensorflow interpret it, I want to know how tensorflow splits the batch into single elements. For example, batch is a 2-D array, right? In which direction does it split an array? Or it uses matrix operations and doesn't split anything?
When I ask, how should I feed my data, I want to know, should it be a 2-D array with samples at its rows and features at its columns, or, maybe, could it be a 2-D list.
When I feed my float numpy array X_train to x, which is :
x = tf.placeholder("float", [1, n_input])
I receive an error:
ValueError: Cannot feed value of shape (1, 18) for Tensor 'Placeholder_10:0', which has shape '(1, 1)'
It appears that I have to create my data as a Tensor too?
When I tried [18x1]:
Cannot feed value of shape (18, 1) for Tensor 'Placeholder_12:0', which has shape '(1, 1)'
They feed whole bathces(which are packs of data points and labels)!
Yes, this is how neural networks are usually trained (due to some nice mathematical properties of having best of two worlds - better gradient approximation than in SGD on one hand and much faster convergence than full GD).
How does tensorflow interpret this "batch" input?
It "interprets" it according to operations in your graph. You probably have reduce mean somewhere in your graph, which calculates average over your batch, thus causing this to be the "interpretation".
And how does it update the weights: 1.simultaniusly after each element in a batch? 2. After running threw the whole batch?.
As in the previous answer - there is nothing "magical" about batch, it is just another dimension, and each internal operation of neural net is well defined for the batch of data, thus there is still a single update in the end. Since you use reduce mean operation (or maybe reduce sum?) you are updating according to mean of the "small" gradients (or sum if there is reduce sum instead). Again - you could control it (up to the agglomerative behaviour, you cannot force it to do per-sample update unless you introduce while loop into the graph).
And, if i need to imput one number(input_shape = [1,1]) and ouput four nubmers (output_shape = [1,4]), how should i change the tf.placeholders and in which form should i feed them into session? THANKS!!
just set the variables, n_input=1 and n_classes=4, and you push your data as before, as [batch, n_input] and [batch, n_classes] arrays (in your case batch=1, if by "1x1" you mean "one sample of dimension 1", since your edit start to suggest that you actually do have a batch, and by 1x1 you meant a 1d input).
EDIT: 1.when i ask, how does tensorflow interpret it, i want to know, how tensorflow split the batch into single elements. For example, batch is a 2-D array, right? In which direction it splits an array. Or it uses matrix operations and doesnt split anything? 2. When i ask, how should i feed my data, i want to know, should it be a 2-D array with samples at its rows and features at its colums, or, maybe, could it be a 2-D list.
It does not split anything. It is just a matrix, and each operation is perfectly well defined for matrices as well. Usually you put examples in rows, thus in first dimension, and this is exactly what [batch, n_inputs] says - that you have batch rows each with n_inputs columns. But again - there is nothing special about it, and you could also create a graph which accepts column-wise batches if you would really need to.

Categories

Resources