I am currently trying to build a recommender system with TensorFlow on my own dataset (user, item, weekday). I have a first version that just uses user-item-interactions as a basis. Now I want to extend it with a context feature (weekday of interaction) like here.
I adapted my model and it trains fine, Tensorflows model.evaluate() also works. As I am trying to compare the results to some self written models, I need to use exactly the same metrics. So I tried to get a prediction for every interaction and then calculate it my way.
This led to problems with the format of the data, as I have to give the user_id as well as the weekday. So I tried going back to the aforementioned example and get results either by using model.predict() or by using tfrs.layers.factorized_top_k.BruteForce() as described for example here.
In the first mentioned notebook, I added the following code at the end:
index = tfrs.layers.factorized_top_k.BruteForce(model.query_model)
index.index_from_dataset(
tf.data.Dataset.zip((movies.batch(100),
movies.batch(100).map(model.candidate_model)))
)
predictions_1 = index(ratings, 10)
predictions_2 = model.predict(cached_test)
BruteForce Way
Trying to get predictions_1 gives me
'CacheDataset' object is not subscriptable
in the call() of the UserModel. I understand that this is caused by trying to access inputs[something] when inputs won't allow accessing via indexing like this. But just don't know what is the correct way to use instead. I tried creating other Dataset (like MapDataset etc.) objects but none of them are subscriptable. Then I tried building up a Tensor and access it with indexing [0, :] for user_id etc. Does not work either because the the Sequential Layer can't handle slices. Converting to numpy does not work either.
model.predict way
Trying to get predictions_2, I implemented the call()-function in the MovieLensModel as described here:
def call(self, inputs):
query_embeddings = self.query_model({
"user_id": inputs["user_id"],
"timestamp": inputs["timestamp"],
})
movie_embeddings = self.candidate_model(inputs["movie_title"])
return tf.matmul(query_embeddings, movie_embeddings, transpose_a=True)
I know that this can not the final or correct way, but see it as a first try. I am not trying to already get the result but some kind of interaction matrix.
However, I get a result but it is in the shape of (160, 32). As 32 is the embedding dimension and both users as well as movies are much more (942 and 1425) in the testing data, I don't know how I get 160. Both embedding results have (None, 32) as shape. Thought about batches but then I should have multiple subtensors in the result.
Moreover, I have the problem that the debugger does not step into the named call()-function, but somehow I can print debug from there? So it seems to be executed but I can't go in there?
Questions
Has anyone used the TFRS with context features and found a way to predict item values for a (user, feature)-combination?
Does anyone have any idea which Datatype I can use for the first prediction try?
How is the idea to the second approach wrong? How should it be done?
Are those even good ideas or is this completely a wrong approach cause I'm missing something?
EDIT:
I found out that one of the problems is the feeding of batches to the BruteForce-Layer. If I do the following, I get reasonable results in form of a tensor containing k movie titles and the corresponding ratings:
for batch in cached_test:
predictions = index(batch, 10)
Nevertheless, this cannot be the preferred way as I get warnings cause I feed a dict to the model.
WARNING:tensorflow:Layers in a Sequential model should only have a single input tensor, but we receive a <class 'dict'> input: ...
So this seems like a workaround and still there is the question for the intended way to do this.
Versions
I am running:
tensorflow: 2.7.0
tensorflow-recommenders: 0.6.0
python: 3.8.5
I was facing the similar issue. I resolved this issue by passing data in dict format through a dataframe
prediction_1 = index(dict(df.loc[i,['user_id','weekday']].map(lambda x:
tf.expand_dims(x,axis=0))))
Note: If your issue is not resolved yet, share your model code...
Related
I have created a prediction model and used RNN in it offered by the tensorflow library in Python. Here is the complete code I have created and tried:
Jupyter Notbook of the Code
But I have doubts.
1) Whether RNN is correct for what I am trying to predict?
2) Is there a better algorithm I can try?
3) Can anyone suggest me how I can give multiple inputs and get the necessary output using tensorflow model? Can anyone guide me please.
I hope I am clear on my points. Please do tell me if anything else required.
Having doubts is normal, but you should try to measure them before asking for advice. If you don't have a clear thing you want to improve it's unlikely you will get something better.
1) Whether RNN is correct for what I am trying to predict?
Yes. RNN is used appropriately here. If you don't care much about having arbitrary length input sequences, you can also try to force them to a fixed size and then apply convolutions on top (see convolutional NeuralNetworks), or even try with a more simple DNN.
The more important question to ask yourself is if you have the right inputs and if you have sufficient training data to learn what you hope to learn.
2) Is there a better algorithm I can try?
Probably no. As I said RNN seems appropriate for this problem. Do try some hyper parameter tuning to make sure you don't accidentally just pick a sub-optimal configuration.
3) Can anyone suggest me how I can give multiple inputs and get the necessary output using tensorflow model? Can anyone guide me please.
The common way to handle variable length inputs is to set a max length and pad the shorter examples until they reach that length. The max length can be a variable you pick or you can dynamically set it to the largest length in the batch. This is needed only because the internal operations are done in batches. You can pick which results you want. Picking the last one is reasonable (the model will just have to learn to propagate the state for the padding values). Another reasonable thing to do is to pick the first one you get after feeding the last meaningful value into the RNN.
Looking at your code, there's one thing I would improve:
Instead of computing a loss on the last value only, I would compute it over all values in the series. This gives your model more training data with very little performance degradation.
Maybe this is a stupid question, but I switched from basic TensorFlow recently to tflearn and while I knew little of TensorFlow, I know even less of tflearn as I have just begun to experiment with it. I was able to create a network, train it, and generate a model that achieved a satisfactory metric. I did this all without using a TensorFlow session because a) none of the documentation I was looking at necessarily suggested it and b) I didn't even think to use it.
However, I would like to predict a value for a single input (the model performs regression on images, so I'm trying to get a value for a single image) and now I'm getting an error that the convolutional layers need to be initialized (Specifically "FailedPreconditionError: Attempting to use uninitialized value Conv2D/W").
The only thing I've added, though, are two lines:
model = Evaluator(network)
model.predict(feed_dict={input_placeholder: image_data})
I'm asking this as a general question because my actual code is a bit troublesome to just post here because admittedly I've been very sloppy in writing it. I will mention, however, that even if I start a session and initialize all variables before that second line, then run the line in the session, I get the same error.
Succinctly put, does tflearn require a session if I've not used TensorFlow stuff directly anywhere in my code? If so, does the model need to be trained in the session? And if not, what about those two lines would cause such an error?
I'm hoping it isn't necessary for more code to be posted, but if this isn't a general issue and is actually specific to my code then I can try to format it to be understandable here and then edit the post.
I'm building a neural network that is supposed to classify input words in some way. Without going into much detail on the network itself, I was looking for a way to convert my input words to an integer format, in order to use TensorFlow's tf.nn.embedding_lookup(...) for input encoding.
I noticed that tf.string_to_number() exists, so I tried using that, but it failed. First I thought it was related to what I'm doing in my network, but even when doing something like
import tensorflow as tf
s = tf.string_to_number("TEST", out_type=tf.int32)
sess = tf.InteractiveSession()
sess.run(s)
in a python console, I get the same error of
tensorflow.python.framework.errors.InvalidArgumentError:
StringToNumberOp could not correctly convert string: TEST
I also tried creating a tf.constant("TEST", dtype=tf.string) first and passing that on to tf.string_to_number() and ran this test code on a webserver to make sure it wasn't related to my setup, but with the same result.
Can anyone tell me what I'm missing here? Thanks in advance!
Can anyone tell me what I'm missing here?
You are missing the purpose of string_to_number it is supposed to convert a number, represented as string, to the numerical type, like tf.string_to_number('1'), it is not "one hot encoder" for strings (how would it be able to figure out the size in the vocab in the first place?)
There is a nice tutorial in tensorflow itself which shows how to train embedding models in word2vec_basic.py which goes through everything, starting with data reading and ending with full embedding using the lookup op.
I am trying to follow the tutorial for Language Modeling on the TensorFlow site. I see it runs and the cost goes down and it is working great, but I do not see any way to actually get the predictions from the model. I tried following the instructions at this answer but the tensors returned from session.run are floating point values like 0.017842259, and the dictionary maps words to integers so that does not work.
How can I get the predicted word from a tensorflow model?
Edit: I found this explanation after searching around, I just am not sure what x and y would be in the context of this example. They don't seem to use the same conventions for this example as they do in the explanation.
The tensor you are mentioning is the loss, which defines how the network is training. For prediction, you need to access the tensor probabilities which contain the probabilities for the next word. If this was classification problem, you'd just do argmax to get the top probability. But, to also give lower probability words a chance of being generated,some kind of sampling is often used.
Edit: I assume the code you used is this. In that case, if you look at line 148 (logits) which can be converted into probabilities by simply applying the softmax function to it -- like shown in the pseudocode in tensorflow website. Hope this helps.
So after going through a bunch of other similar posts I figured this out. First, the code explained in the documentation is not the same as the code on the GitHub repository. The current code works by initializing models with data inside instead of passing data to the model as it goes along.
So basically to accomplish what I was trying to do, I reverted my code to commit 9274f5a (also do the same for reader.py). Then I followed the steps taken in this post to get the probabilities tensor in my run_epoch function. Additionally, I followed this answer to pass the vocabulary to my main function. From there, I inverted the dict using vocabulary = {v: k for k, v in vocabulary.items()} and passed it to run_epoch.
Finally, we can get the predicted word in run_epoch by running current_word = vocabulary[np.argmax(prob, 1)] where prob is the tensor returned from session.run()
Edit: Reverting the code as such should not be a permanent solution and I definitely recommend using #Prophecies answer above to get the probabilities tensor. However, if you want to get the word mapping, you will need to pass the vocabulary as I did here.
I try to get reliable features for ImageNet to do further classification on them. To achieve that I would like to use tensorflow with Alexnet, for feature extraction. That means I would like to get the values from the last layer in the CNN. Could someone write a piece of Python code that explains how that works?
As jonrsharpe mentioned, that's not really stackoverflow's MO, but in practice, many people do choose to write code to help explain answers (because it's often easier).
So I'm going to assume that this was just miscommunication, and you really intended to ask one of the following two questions:
How does one grab the values of the last layer of Alexnet in TensorFlow?
How does feature extraction from the last layer of a deep convolutional network like alexnet work?
The answer to the first question is actually very easy. I'll use the cifar10 example code in TensorFlow (which is loosely based on AlexNet) as an example. The forward pass of the network is built in the inference function, which returns a variable representing the output of the softmax layer. To actually get predicted image labels, you just argmax the logits, like this: (I've left out some of the setup code, but if you're already running alexnet, you already have that working)
logits = cifar10.inference(images)
predictions = tf.argmax(logits,1)
# Actually run the computation
labels = session.run([predictions])
So grabbing just the last layer features is literally just as easy as asking for them. The only wrinkle is that, in this case, cifar10 doesn't natively expose them, so you need to modify the cifar10.inference function to return both:
# old code in cifar10.inference:
# return softmax_linear
# new code in cifar10.inference:
return softmax_linear, local4
And then modify all the calls to cifar10.inference, like the one we just showed:
logits,local4 = cifar10.inference(images)
predictions = tf.argmax(logits,1)
# Actually run the computation, this time asking for both answers
labels,last_layer = session.run([predictions, local4])
And that's it. last_layer contains the last layer for all of the inputs you gave the model.
As for the second question, that's a much deeper question, but I'm guessing that's why you want to work on it. I'd suggest starting by reading up on some of the papers published in this area. I'm not an expert here, but I do like Bolei Zhou's work. For instance, try looking at Figure 2 in "Learning Deep Features for Discriminative Localization". It's a localization paper, but it's using very similar techniques (and several of Bolei's papers use it).