How do "de-embed" words in TensorFlow

How do "de-embed" words in TensorFlow - python

I am trying to follow the tutorial for Language Modeling on the TensorFlow site. I see it runs and the cost goes down and it is working great, but I do not see any way to actually get the predictions from the model. I tried following the instructions at this answer but the tensors returned from session.run are floating point values like 0.017842259, and the dictionary maps words to integers so that does not work.
How can I get the predicted word from a tensorflow model?
Edit: I found this explanation after searching around, I just am not sure what x and y would be in the context of this example. They don't seem to use the same conventions for this example as they do in the explanation.

The tensor you are mentioning is the loss, which defines how the network is training. For prediction, you need to access the tensor probabilities which contain the probabilities for the next word. If this was classification problem, you'd just do argmax to get the top probability. But, to also give lower probability words a chance of being generated,some kind of sampling is often used.
Edit: I assume the code you used is this. In that case, if you look at line 148 (logits) which can be converted into probabilities by simply applying the softmax function to it -- like shown in the pseudocode in tensorflow website. Hope this helps.

So after going through a bunch of other similar posts I figured this out. First, the code explained in the documentation is not the same as the code on the GitHub repository. The current code works by initializing models with data inside instead of passing data to the model as it goes along.
So basically to accomplish what I was trying to do, I reverted my code to commit 9274f5a (also do the same for reader.py). Then I followed the steps taken in this post to get the probabilities tensor in my run_epoch function. Additionally, I followed this answer to pass the vocabulary to my main function. From there, I inverted the dict using vocabulary = {v: k for k, v in vocabulary.items()} and passed it to run_epoch.
Finally, we can get the predicted word in run_epoch by running current_word = vocabulary[np.argmax(prob, 1)] where prob is the tensor returned from session.run()
Edit: Reverting the code as such should not be a permanent solution and I definitely recommend using #Prophecies answer above to get the probabilities tensor. However, if you want to get the word mapping, you will need to pass the vocabulary as I did here.

Related

Predict values with tensorflow recommender system model using context features

I am currently trying to build a recommender system with TensorFlow on my own dataset (user, item, weekday). I have a first version that just uses user-item-interactions as a basis. Now I want to extend it with a context feature (weekday of interaction) like here.
I adapted my model and it trains fine, Tensorflows model.evaluate() also works. As I am trying to compare the results to some self written models, I need to use exactly the same metrics. So I tried to get a prediction for every interaction and then calculate it my way.
This led to problems with the format of the data, as I have to give the user_id as well as the weekday. So I tried going back to the aforementioned example and get results either by using model.predict() or by using tfrs.layers.factorized_top_k.BruteForce() as described for example here.
In the first mentioned notebook, I added the following code at the end:
index = tfrs.layers.factorized_top_k.BruteForce(model.query_model)
index.index_from_dataset(
tf.data.Dataset.zip((movies.batch(100),
movies.batch(100).map(model.candidate_model)))
)
predictions_1 = index(ratings, 10)
predictions_2 = model.predict(cached_test)
BruteForce Way
Trying to get predictions_1 gives me
'CacheDataset' object is not subscriptable
in the call() of the UserModel. I understand that this is caused by trying to access inputs[something] when inputs won't allow accessing via indexing like this. But just don't know what is the correct way to use instead. I tried creating other Dataset (like MapDataset etc.) objects but none of them are subscriptable. Then I tried building up a Tensor and access it with indexing [0, :] for user_id etc. Does not work either because the the Sequential Layer can't handle slices. Converting to numpy does not work either.
model.predict way
Trying to get predictions_2, I implemented the call()-function in the MovieLensModel as described here:
def call(self, inputs):
query_embeddings = self.query_model({
"user_id": inputs["user_id"],
"timestamp": inputs["timestamp"],
})
movie_embeddings = self.candidate_model(inputs["movie_title"])
return tf.matmul(query_embeddings, movie_embeddings, transpose_a=True)
I know that this can not the final or correct way, but see it as a first try. I am not trying to already get the result but some kind of interaction matrix.
However, I get a result but it is in the shape of (160, 32). As 32 is the embedding dimension and both users as well as movies are much more (942 and 1425) in the testing data, I don't know how I get 160. Both embedding results have (None, 32) as shape. Thought about batches but then I should have multiple subtensors in the result.
Moreover, I have the problem that the debugger does not step into the named call()-function, but somehow I can print debug from there? So it seems to be executed but I can't go in there?
Questions
Has anyone used the TFRS with context features and found a way to predict item values for a (user, feature)-combination?
Does anyone have any idea which Datatype I can use for the first prediction try?
How is the idea to the second approach wrong? How should it be done?
Are those even good ideas or is this completely a wrong approach cause I'm missing something?
EDIT:
I found out that one of the problems is the feeding of batches to the BruteForce-Layer. If I do the following, I get reasonable results in form of a tensor containing k movie titles and the corresponding ratings:
for batch in cached_test:
predictions = index(batch, 10)
Nevertheless, this cannot be the preferred way as I get warnings cause I feed a dict to the model.
WARNING:tensorflow:Layers in a Sequential model should only have a single input tensor, but we receive a <class 'dict'> input: ...
So this seems like a workaround and still there is the question for the intended way to do this.
Versions
I am running:
tensorflow: 2.7.0
tensorflow-recommenders: 0.6.0
python: 3.8.5

I was facing the similar issue. I resolved this issue by passing data in dict format through a dataframe
prediction_1 = index(dict(df.loc[i,['user_id','weekday']].map(lambda x:
tf.expand_dims(x,axis=0))))
Note: If your issue is not resolved yet, share your model code...

Inner workings of Gensim Word2Vec

I have a couple of issues regarding Gensim in its Word2Vec model.
The first is what is happening if I set it to train for 0 epochs? Does it just create the random vectors and calls it done. So they have to be random every time, correct?
The second is concerning the WV object in the doc page says:
This object essentially contains the mapping between words and embeddings.
After training, it can be used directly to query those embeddings in various ways.
See the module level docstring for examples.
But that is not clear to me, allow me to explain I have my own created word vectors which I have substitute in the
word2vecObject.wv['word'] = my_own
Then call the train method with those replacement word vectors. But I would like to know which part am I replacing, is it the input to hidden weight layer or the hidden to input? This is to check if it can be called pre-training or not. Any help? Thank you.

I've not tried the nonsense parameter epochs=0, but it might behave as you expect. (Have you tried it and seen otherwise?)
However, if your real goal is to be able to tamper with the model after initialization, but before training, the usual way to do that is to not supply any corpus when constructing the model instance, and instead manually do the two followup steps, .build_vocab() & .train(), in your own code - inserting extra steps between the two. (For even finer-grained control, you can examine the source of .build_vocab() & its helper methods, and simply ensure you do all those necessary things, with your own extra steps interleaved.)
The "word vectors" in the .wv property of type KeyedVectors are essentially the "input projection layer" of the model: the data which converts a single word into a vector_size-dimensional dense embedding. (You can think of the keys – word token strings – as being somewhat like a one-hot word-encoding.)
So, assigning into that structure only changes that "input projection vector", which is the "word vector" usually collected from the model. If you need to tamper with the hidden-to-output weights, you need to look at the model's .syn1neg (or .syn1 for HS mode) property.

Prediction and forecasting with Python tensorflow

I have created a prediction model and used RNN in it offered by the tensorflow library in Python. Here is the complete code I have created and tried:
Jupyter Notbook of the Code
But I have doubts.
1) Whether RNN is correct for what I am trying to predict?
2) Is there a better algorithm I can try?
3) Can anyone suggest me how I can give multiple inputs and get the necessary output using tensorflow model? Can anyone guide me please.
I hope I am clear on my points. Please do tell me if anything else required.

Having doubts is normal, but you should try to measure them before asking for advice. If you don't have a clear thing you want to improve it's unlikely you will get something better.
1) Whether RNN is correct for what I am trying to predict?
Yes. RNN is used appropriately here. If you don't care much about having arbitrary length input sequences, you can also try to force them to a fixed size and then apply convolutions on top (see convolutional NeuralNetworks), or even try with a more simple DNN.
The more important question to ask yourself is if you have the right inputs and if you have sufficient training data to learn what you hope to learn.
2) Is there a better algorithm I can try?
Probably no. As I said RNN seems appropriate for this problem. Do try some hyper parameter tuning to make sure you don't accidentally just pick a sub-optimal configuration.
3) Can anyone suggest me how I can give multiple inputs and get the necessary output using tensorflow model? Can anyone guide me please.
The common way to handle variable length inputs is to set a max length and pad the shorter examples until they reach that length. The max length can be a variable you pick or you can dynamically set it to the largest length in the batch. This is needed only because the internal operations are done in batches. You can pick which results you want. Picking the last one is reasonable (the model will just have to learn to propagate the state for the padding values). Another reasonable thing to do is to pick the first one you get after feeding the last meaningful value into the RNN.
Looking at your code, there's one thing I would improve:
Instead of computing a loss on the last value only, I would compute it over all values in the series. This gives your model more training data with very little performance degradation.

How to get predictions out of tensorflow model after you've used tf.group on your optimizers

I'm trying to write something similar to google's wide and deep learning after running into difficulties of doing multi-class classification(12 classes) with the sklearn api. I've tried to follow the advice in a couple of posts and used the tf.group(logistic_regression_optimizer, deep_model_optimizer). It seems to work but I was trying to figure out how to get predictions out of this model. I'm hoping that with the tf.group operator the model is learning to weight the logistic and deep models differently but I don't know how to get these weights out so I can get the right combination of the two model's predictions. Thanks in advance for any help.
https://groups.google.com/a/tensorflow.org/forum/#!topic/discuss/Cs0R75AGi8A
How to set layer-wise learning rate in Tensorflow?

tf.group() creates a node that forces a list of other nodes to run using control dependencies. It's really just a handy way to package up logic that says "run this set of nodes, and I don't care about their output". In the discussion you point to, it's just a convenient way to create a single train_op from a pair of training operators.
If you're interested in the value of a Tensor (e.g., weights), you should pass it to session.run() explicitly, either in the same call as the training step, or in a separate session.run() invocation. You can pass a list of values to session.run(), for example, your tf.group() expression, as well as a Tensor whose value you would like to compute.
Hope that helps!

Alex-Net for feature extraction

I try to get reliable features for ImageNet to do further classification on them. To achieve that I would like to use tensorflow with Alexnet, for feature extraction. That means I would like to get the values from the last layer in the CNN. Could someone write a piece of Python code that explains how that works?

As jonrsharpe mentioned, that's not really stackoverflow's MO, but in practice, many people do choose to write code to help explain answers (because it's often easier).
So I'm going to assume that this was just miscommunication, and you really intended to ask one of the following two questions:
How does one grab the values of the last layer of Alexnet in TensorFlow?
How does feature extraction from the last layer of a deep convolutional network like alexnet work?
The answer to the first question is actually very easy. I'll use the cifar10 example code in TensorFlow (which is loosely based on AlexNet) as an example. The forward pass of the network is built in the inference function, which returns a variable representing the output of the softmax layer. To actually get predicted image labels, you just argmax the logits, like this: (I've left out some of the setup code, but if you're already running alexnet, you already have that working)
logits = cifar10.inference(images)
predictions = tf.argmax(logits,1)
# Actually run the computation
labels = session.run([predictions])
So grabbing just the last layer features is literally just as easy as asking for them. The only wrinkle is that, in this case, cifar10 doesn't natively expose them, so you need to modify the cifar10.inference function to return both:
# old code in cifar10.inference:
# return softmax_linear
# new code in cifar10.inference:
return softmax_linear, local4
And then modify all the calls to cifar10.inference, like the one we just showed:
logits,local4 = cifar10.inference(images)
predictions = tf.argmax(logits,1)
# Actually run the computation, this time asking for both answers
labels,last_layer = session.run([predictions, local4])
And that's it. last_layer contains the last layer for all of the inputs you gave the model.
As for the second question, that's a much deeper question, but I'm guessing that's why you want to work on it. I'd suggest starting by reading up on some of the papers published in this area. I'm not an expert here, but I do like Bolei Zhou's work. For instance, try looking at Figure 2 in "Learning Deep Features for Discriminative Localization". It's a localization paper, but it's using very similar techniques (and several of Bolei's papers use it).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.