I'm building a neural network that is supposed to classify input words in some way. Without going into much detail on the network itself, I was looking for a way to convert my input words to an integer format, in order to use TensorFlow's tf.nn.embedding_lookup(...) for input encoding.
I noticed that tf.string_to_number() exists, so I tried using that, but it failed. First I thought it was related to what I'm doing in my network, but even when doing something like
import tensorflow as tf
s = tf.string_to_number("TEST", out_type=tf.int32)
sess = tf.InteractiveSession()
sess.run(s)
in a python console, I get the same error of
tensorflow.python.framework.errors.InvalidArgumentError:
StringToNumberOp could not correctly convert string: TEST
I also tried creating a tf.constant("TEST", dtype=tf.string) first and passing that on to tf.string_to_number() and ran this test code on a webserver to make sure it wasn't related to my setup, but with the same result.
Can anyone tell me what I'm missing here? Thanks in advance!
Can anyone tell me what I'm missing here?
You are missing the purpose of string_to_number it is supposed to convert a number, represented as string, to the numerical type, like tf.string_to_number('1'), it is not "one hot encoder" for strings (how would it be able to figure out the size in the vocab in the first place?)
There is a nice tutorial in tensorflow itself which shows how to train embedding models in word2vec_basic.py which goes through everything, starting with data reading and ending with full embedding using the lookup op.
Related
I am currently trying to build a recommender system with TensorFlow on my own dataset (user, item, weekday). I have a first version that just uses user-item-interactions as a basis. Now I want to extend it with a context feature (weekday of interaction) like here.
I adapted my model and it trains fine, Tensorflows model.evaluate() also works. As I am trying to compare the results to some self written models, I need to use exactly the same metrics. So I tried to get a prediction for every interaction and then calculate it my way.
This led to problems with the format of the data, as I have to give the user_id as well as the weekday. So I tried going back to the aforementioned example and get results either by using model.predict() or by using tfrs.layers.factorized_top_k.BruteForce() as described for example here.
In the first mentioned notebook, I added the following code at the end:
index = tfrs.layers.factorized_top_k.BruteForce(model.query_model)
index.index_from_dataset(
tf.data.Dataset.zip((movies.batch(100),
movies.batch(100).map(model.candidate_model)))
)
predictions_1 = index(ratings, 10)
predictions_2 = model.predict(cached_test)
BruteForce Way
Trying to get predictions_1 gives me
'CacheDataset' object is not subscriptable
in the call() of the UserModel. I understand that this is caused by trying to access inputs[something] when inputs won't allow accessing via indexing like this. But just don't know what is the correct way to use instead. I tried creating other Dataset (like MapDataset etc.) objects but none of them are subscriptable. Then I tried building up a Tensor and access it with indexing [0, :] for user_id etc. Does not work either because the the Sequential Layer can't handle slices. Converting to numpy does not work either.
model.predict way
Trying to get predictions_2, I implemented the call()-function in the MovieLensModel as described here:
def call(self, inputs):
query_embeddings = self.query_model({
"user_id": inputs["user_id"],
"timestamp": inputs["timestamp"],
})
movie_embeddings = self.candidate_model(inputs["movie_title"])
return tf.matmul(query_embeddings, movie_embeddings, transpose_a=True)
I know that this can not the final or correct way, but see it as a first try. I am not trying to already get the result but some kind of interaction matrix.
However, I get a result but it is in the shape of (160, 32). As 32 is the embedding dimension and both users as well as movies are much more (942 and 1425) in the testing data, I don't know how I get 160. Both embedding results have (None, 32) as shape. Thought about batches but then I should have multiple subtensors in the result.
Moreover, I have the problem that the debugger does not step into the named call()-function, but somehow I can print debug from there? So it seems to be executed but I can't go in there?
Questions
Has anyone used the TFRS with context features and found a way to predict item values for a (user, feature)-combination?
Does anyone have any idea which Datatype I can use for the first prediction try?
How is the idea to the second approach wrong? How should it be done?
Are those even good ideas or is this completely a wrong approach cause I'm missing something?
EDIT:
I found out that one of the problems is the feeding of batches to the BruteForce-Layer. If I do the following, I get reasonable results in form of a tensor containing k movie titles and the corresponding ratings:
for batch in cached_test:
predictions = index(batch, 10)
Nevertheless, this cannot be the preferred way as I get warnings cause I feed a dict to the model.
WARNING:tensorflow:Layers in a Sequential model should only have a single input tensor, but we receive a <class 'dict'> input: ...
So this seems like a workaround and still there is the question for the intended way to do this.
Versions
I am running:
tensorflow: 2.7.0
tensorflow-recommenders: 0.6.0
python: 3.8.5
I was facing the similar issue. I resolved this issue by passing data in dict format through a dataframe
prediction_1 = index(dict(df.loc[i,['user_id','weekday']].map(lambda x:
tf.expand_dims(x,axis=0))))
Note: If your issue is not resolved yet, share your model code...
So I am basically using this transformer implementation for my project: https://github.com/Kyubyong/transformer .
It works great on the German to English translation it was originally written for and I modified the processing python script in order to create vocabulary files for the languages that I want to translate. This seems to work fine.
However when it comes to training I get the following error:
InvalidArgumentError (see above for traceback): Restoring from
checkpoint failed. This is most likely due to a mismatch between the
current graph and the graph from the checkpoint. Please ensure that
you have not altered the graph expected based on the checkpoint.
Original error:
Assign requires shapes of both tensors to match. lhs shape= [9796,512]
rhs shape= [9786,512] [[{{node save/Assign_412}} =
Assign[T=DT_FLOAT, _class=["loc:#encoder/enc_embed/lookup_table"],
use_locking=true, validate_shape=true,
_device="/job:localhost/replica:0/task:0/device:CPU:0"](encoder/enc_embed/lookup_table/Adam_1,
save/RestoreV2:412)]]
Now I have no idea why I am getting the above error. I also reverted to the original code to translate from German to English and now I get the same error (except the lhs and rhs tensor shapes are different of course), when before it was working!
Any ideas on why this could be happening?
Thanks in advance
EDIT: This is the specific file in question here, the train.py when it is run: https://github.com/Kyubyong/transformer/blob/master/train.py
Nothing has been modified other than the fact that the vocab loaded for de and en are differently (they're in fact vocab files with single letters as words). However as I mentioned that even when resorting back to the prevous working example I get the same error with different lhs and rhs dimensions.
I was getting a similar error, I my case it seems that the output of previous failed jobs was remaining on the output dir and there were some incompatibilities when saving/restoring the checkpoints of the new job, so I just cleaned it up the output dir and then the new job worked correctly.
I was facing same issue while exporting/saving the model. I was referring to example given in this URL: https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/exporting_models.md
There are three things which you have to make sure are correct, if you are facing above issue:
Cleanup model directory and extract fresh model
Make sure that you are using correct pair of pipeline-config file and its corresponding TF model.
use correct model checkpoint. see below example for that:
I updated my TRAINED_CKPT_PREFIX value to the saving point of my model and it worked from me (see below example):
TRAINED_CKPT_PREFIX=./data/model.ckpt-139
In your case please use your saving point number in my case it is 139
Previously I was using ./data/model.ckpt only which was not working.
The large number is almost certainly the size of your vocabulary. The initial matrix will have size [vocab_size, hidden_dim]. So by changing the size of your vocab you are breaking things.
Presumably the solution is just to make sure you clean out all your checkpoints so that you are only looking at models trained with the vocabulary you want.
I am trying to check if my .onnx model is correct, and need to run inference to verify the output for the same.
I know we can run validation on .mlmodel using coremltools in Python - basically load the model and input and get the prediction. I am trying to do a similar thing for the .onnx model.
I found the MXNet framework but I can't seem to understand how to import the model - I just have the .onnx file and MXNet requires some extra input besides the onnx model.
Is there any other simple way to do this in Python? I am guessing this is a common problem but can't seem to find any relevant libraries/frameworks to do this as easily as coremltools for .mlmodel.
I do not wish to convert .onnx to another type of model (like say PyTorch) as I want to check the .onnx model as is, not worrying if the conversion was correct. Just need a way to load the model and input, run inference and print the output.
This is my first time encountering these formats, so any help or insight would be appreciated.
Thanks!
I figured out a way to do this using Caffe2 - just posting in case someone in the future tries to do the same thing.
The main code snippet is:
import onnx
import caffe2.python.onnx.backend
from caffe2.python import core, workspace
import numpy as np
# make input Numpy array of correct dimensions and type as required by the model
modelFile = onnx.load('model.onnx')
output = caffe2.python.onnx.backend.run_model(modelFile, inputArray.astype(np.float32))
Also it is important to note that the input to run_model can only be a numpy array or a string. The output will be an object of the Backend.Outputs type. I was able to extract the output numpy array from it.
I was able to execute inference on the CPU, and hence did not need the Caffe2 installation with GPU (requiring CUDA and CDNN).
Maybe this is a stupid question, but I switched from basic TensorFlow recently to tflearn and while I knew little of TensorFlow, I know even less of tflearn as I have just begun to experiment with it. I was able to create a network, train it, and generate a model that achieved a satisfactory metric. I did this all without using a TensorFlow session because a) none of the documentation I was looking at necessarily suggested it and b) I didn't even think to use it.
However, I would like to predict a value for a single input (the model performs regression on images, so I'm trying to get a value for a single image) and now I'm getting an error that the convolutional layers need to be initialized (Specifically "FailedPreconditionError: Attempting to use uninitialized value Conv2D/W").
The only thing I've added, though, are two lines:
model = Evaluator(network)
model.predict(feed_dict={input_placeholder: image_data})
I'm asking this as a general question because my actual code is a bit troublesome to just post here because admittedly I've been very sloppy in writing it. I will mention, however, that even if I start a session and initialize all variables before that second line, then run the line in the session, I get the same error.
Succinctly put, does tflearn require a session if I've not used TensorFlow stuff directly anywhere in my code? If so, does the model need to be trained in the session? And if not, what about those two lines would cause such an error?
I'm hoping it isn't necessary for more code to be posted, but if this isn't a general issue and is actually specific to my code then I can try to format it to be understandable here and then edit the post.
I am trying to follow the tutorial for Language Modeling on the TensorFlow site. I see it runs and the cost goes down and it is working great, but I do not see any way to actually get the predictions from the model. I tried following the instructions at this answer but the tensors returned from session.run are floating point values like 0.017842259, and the dictionary maps words to integers so that does not work.
How can I get the predicted word from a tensorflow model?
Edit: I found this explanation after searching around, I just am not sure what x and y would be in the context of this example. They don't seem to use the same conventions for this example as they do in the explanation.
The tensor you are mentioning is the loss, which defines how the network is training. For prediction, you need to access the tensor probabilities which contain the probabilities for the next word. If this was classification problem, you'd just do argmax to get the top probability. But, to also give lower probability words a chance of being generated,some kind of sampling is often used.
Edit: I assume the code you used is this. In that case, if you look at line 148 (logits) which can be converted into probabilities by simply applying the softmax function to it -- like shown in the pseudocode in tensorflow website. Hope this helps.
So after going through a bunch of other similar posts I figured this out. First, the code explained in the documentation is not the same as the code on the GitHub repository. The current code works by initializing models with data inside instead of passing data to the model as it goes along.
So basically to accomplish what I was trying to do, I reverted my code to commit 9274f5a (also do the same for reader.py). Then I followed the steps taken in this post to get the probabilities tensor in my run_epoch function. Additionally, I followed this answer to pass the vocabulary to my main function. From there, I inverted the dict using vocabulary = {v: k for k, v in vocabulary.items()} and passed it to run_epoch.
Finally, we can get the predicted word in run_epoch by running current_word = vocabulary[np.argmax(prob, 1)] where prob is the tensor returned from session.run()
Edit: Reverting the code as such should not be a permanent solution and I definitely recommend using #Prophecies answer above to get the probabilities tensor. However, if you want to get the word mapping, you will need to pass the vocabulary as I did here.