First of all, I would like to create a recommender system. With the help of a neural network, this is supposed to make a prediction of which articles user X is most likely to buy.
I have already trained a model with the right datasets and the help of the neuMF model (you can also look at the different layers in the picture).
[Source https://arxiv.org/abs/1708.05031]
My dataset contains the following:
The column event contains whether the user has looked at an item (view), placed it in the shopping cart (addtocart) or bought it (transaction).
I have already found example implementations of how they determine the recommendations. The following was written about it:
Now that I've trained my model, I'm ready to recommend songs for a
given playlist! However, one issue that I encountered (see below) is
that I need the embedding of that new playlist (as stored in my model)
in order to find the closest relevant playlists in that embedding
space using kmeans. I am not sure how to get around this issue- as is,
it seems that I have to retrain my whole model each time I get an
input playlist in order to get that playlist embedding. Therefore, I
just test my model on a randomly chosen playlist (which happens to be
rock and oldies, mostly!) from the training set.
To recommend songs, I first cluster the learned embeddings for all of
the training playlists, and then select "neighbor" playlists for my
given test playlist as all of the other playlists in that same
cluster. I then take all of the tracks from these playlists and feed
the test playlist embedding and these "neighboring" tracks into my
model for prediction. This ranks the "neighboring" tracks by how
likely they are (under my model) to occur next in the given test
playlist.
[Source https://github.com/caravanuden/spotify_recsys]
I've just trained my model and now I'd like to make a recommendation as to which items User X is most likely to buy.
Do I have to carry out another implementation of an algorithm that determines, for example, the nearest neighbors (knn) or is it sufficient to train the model and then derive the data from it?
How do I proceed after I have trained the model with the data, how do I get the recommendations from it? What is state of the art in this area in order to receive the recommendations from the trained model?
Thanks in advance. Looking forward to suggestions, ideas and answers.
It depends on your use case for the model. This is twofold, firstly because of the performance (speed) required for your specific use case, and secondly in regards to the main weakness (in my opinion) with the neuMF model: if a user interacts with some more items, the predictions will not change, since they were not part of the training. Because of this, if it is used in an real-time-online setting, the recommendations will essentially be based on previous behavior, and will not take into account the current session, unless the model is retrained.
The neuMF model is particularly good at batch predictions for interval recommendations. If you, for example, would like to recommend items to users in a weekly email, then you would for each user, predict the output probability for each item, and then select top n (eg. 10) probabilities and recommend those. (You would have to retrain the model next week, in order to get other predictions based on the users' latest item interactions.) So if there are 10000 unique items, for each user, make 10000 individual predictions, and recommend n-items based on those. The main drawback is of course that these 10000 predictions takes a while to perform. Because of this, it might not be suitable for real-time online predictions. On the other hand, if you are clever with parallelization of the predictions, this limitation could be surpassed as well, although, might be unnecessary. This because, as explained previously, the predictions will not change depending on current user interactions.
Using knn to cluster users in the embedding-space, and then take these users' items, and feed them into the model seems unnecessary, and in my option, defeats the purpose of the whole model-architecture. This because the whole point of the neuMF model is to generalize a given user's interaction with items among all the other users' interaction, and base the recommendations on that, so that you can, given a user and an item, get the probability for that specific item.
Related
I would like to create word embeddings that take context into account, so the vector of the word Jaguar [animal] would be different from the word Jaguar [car brand].
As you know, word2vec only gives one representation for a given word, and I would like to take already pretrained embeddings and enrich them with context. So far I've tried a simple way with taking an average vector of the word and category word, for example like this.
Now I would like to try to create and train a neural network that would take entire sentences, e.g.
Jaguar F-PACE is a great SUV sports car.
Among cats, only tigers and lions are bigger than jaguars.
And then it would undertake the task of text classification (I have a dataset with several categories like animals, cars, etc.), but the result would be new representations for the word jaguar, but in different contexts, so two different embeddings.
Does anyone have any idea how I could create such a network? I don't hide that I'm a beginner and have no idea how to go about it.
If you've already been able to perform sense-disambiguation outside word2vec, then you can change the word-tokens to reflect your external judgement. For example, change some appearances of the token 'jaguar' to 'jaguar*car' and others to 'jaguar*animal'. Proceeding with normal word2vec training will then get your two different tokens two different word-vectors.
If you're hoping for the training to discover these itself, as ~Erwan mentioned in a comment, that seems like an open research question, without a standard or off-the-shelf solution that a beginner could drop-in.
I'd once seen a paper (around the time of the original word2vec papers, but can't find the link now) that tried to do this in a word2vec-compatible way by 1st proceeding with traditional polysemy-oblivious training. Then, for every appearance of a word X, model its surrounding context via some combination of the word-vectors of neighbors within a certain number of positions. (That in itself is very similar to the preparation of a context-vector in the CBOW mode of word2vec.) Perform some clustering on that collection-of-all-contexts to come up with some idea of alternate senses – each associated with one cluster. Then, in a followup pass on the original corpus, replace word-tokens with those that also reflect their nearby-context cluster. (EG: 'jaguar' might be replaced with 'jaguar*1', 'jaguar*2', etc based on which discrete cluster its context suggested.) Then, repeat (or continue) word2vec training to get sense-specific word-vectors. Of course, the devil would be in the details of how contexts are defined, how clusters are deduced, and tough edge-cases (where potentially the text's author is themselves deploying the multiple senses).
Some other interesting efforts to model or deduce polysemy in word2vec models:
"Linear Algebraic Structure of Word Meanings"
"A Simple Approach to Learn Polysemous Word Embeddings"
But per above, I've not seen these sorts of techniques widely implemented/adopted in a form that's easy to drop-in to another project.
Likely a quite trivial question:
I am using Python Surprise to build a classic recommender system. I have a training dataset that I can use to build my model. Once trained, I would like to predict ratings that users will give to items. These users are new to the system, so they are not in my training set. I would like to generate recommendations for them on the fly after they have rated a few items and without re-training the model.
So basically my code is:
from surprise import SVD
trainset = data.build_full_trainset()
svd = SVD(verbose=True, n_epochs=10)
svd.fit(trainset)
res = svd.predict(uid=5, iid="0")
But instead of predicting the user with uid=5 from the data set, I would like to add a new user and a few ratings given by that user and then predict other ratings for that user. Is that possible in general and in particular with Surprise?
I went through this case study of Structural Time Series Modeling in TensorFlow, but I couldn't find a way to add future values of features. I would like to add holidays effect, but when I am following these steps my holidays starts to repeat in forecast period.
Below is visualisation from case study, you can see that temperature_effect starts from begginig.
Is it possible to feed the model with actual future data?
Edit:
In my case holidays started to repeat in my forecast which does not make sense.
Just now I have found issue on github refering to this problem, there is workaround to this problem.
There is a slight fallacy in what you are asking in particular. As mentioned in my comment when predicting with a model, future data does not exist because it just hasn't happened yet. For whatever model its not possible to feed data that does not exist. However you could use an autoregressive approach as defined in the link above to feed 'future' data. A Pseudo example would be as follows:
Model 1: STS model with inputs x_in and x_future to predict y_future.
You could stack this with a secondary helper model that predicts x_future from x_in.
Model 2: Regression model with input x_in predicting x_future.
Concatenating these models will result then allow your STS model to take into account 'future' feature elements. On the other hand in your question you mention a holiday effect. You could simply add another input where you define via some if/else case if a holiday effect is active or inactive. You could also use random sampling of your holiday effect as well and it might help. To exactly help you with code/model to do what you want I'll need to have more details on your model/inputs/outputs.
In simple words, you can't work with data that doesn't exist so you either need to spoof it or get it in some other way.
The tutorial on forecasting is here:https://www.tensorflow.org/probability/examples/Structural_Time_Series_Modeling_Case_Studies_Atmospheric_CO2_and_Electricity_Demand
You only need to enter new data and parameters to predict how many results in the future.
I have a TensorFlow recommendation system based off TF-recomm. Each user has 1+numFactors numbers associated with her: a vector of numFactors, and an offset of a single number. Each task also has a bias and a vector of numFactors assigned. The TF-recomm code is
def inference_svd(user_batch, item_batch, user_num, item_num, dim=5):
bias_global = tf.get_variable("bias_global", shape=[])
w_bias_user = tf.get_variable("embd_bias_user", shape=[user_num])
w_bias_item = tf.get_variable("embd_bias_item", shape=[item_num])
bias_user = tf.nn.embedding_lookup(w_bias_user, user_batch, name="bias_user")
bias_item = tf.nn.embedding_lookup(w_bias_item, item_batch, name="bias_item")
w_user = tf.get_variable("embd_user", shape=[user_num, dim], initializer=tf.truncated_normal_initializer(stddev=0.02))
w_item = tf.get_variable("embd_item", shape=[item_num, dim], initializer=tf.truncated_normal_initializer(stddev=0.02))
embd_user = tf.nn.embedding_lookup(w_user, user_batch, name="embedding_user")
embd_item = tf.nn.embedding_lookup(w_item, item_batch, name="embedding_item")
infer = tf.reduce_sum(tf.multiply(embd_user, embd_item), 1)
infer = tf.add(infer, bias_global)
infer = tf.add(infer, bias_user)
infer = tf.add(infer, bias_item, name="svd_inference")
regularizer = tf.add(tf.nn.l2_loss(embd_user), tf.nn.l2_loss(embd_item), name="svd_regularizer")
return infer, regularizer
I have been able to get this code to work, and have been able to link it up with a REST-API.
The problem that I encounter is when I get new users. I know what I want to do:
Add a row to the bias_user, initialized to 0
Add a row to the embd_user, initialized to 0
When users rate new items, we use the same graph but freeze the weights on the items (which I can do with var_list on optimizer.minimize)
However, the weights and biases have their shapes declared ahead of time. All the material I have seen on tensorflow (running or deploying) allows the weights to change, but doesn't seem to allow the network to grow.
If I implemented this in numpy I would simply add new rows to the appropriate matrices. There are a couple of ways of doing this, such as creating new graphs and variables, but it seems best to reuse the graph used to train the model in the first place (to ensure consistency).
I am looking for a system of "best practices" for dealing with changing the size of embedding tensors, especially for a system that is online where it will have to serve predictions quickly (which prevents expensive operations).
The fundamental difficulty when it comes to adding new users in your system is that you need retraining to be able to give meaningful predictions to new users. Even if you were able to dynamically resize the embedding matrices, what values would you use for the parameters describing the new user?
Taking this into account, you have a couple of options.
Save the weights of the graph, then create a new graph with adjusted dimensions and retrain it on data that includes information on the new user. As you say, this may be too costly to be in your critical path.
Use some sort of fold-in approach. For example, you could initialise the new user's embedding using the average of embeddings of users that have interacted with similar items.
Use a model that doesn't have this problem and that can incorporate new users in a more natural manner.
My recommendation would be the third option. There are classes of models that take the sequence (or set) of user interactions directly when making predictions, and do not rely on you declaring the number of users ahead of time. For example, you could use on of the following:
The AutoRec model: a simple autoencoder model that takes the set of items the user has interacted with as the input.
Session-based Recommendations with Recurrent Neural Networks: a recurrent model that takes as input the sequence of user interactions at prediction time.
Both models naturally handle new users without changes to the computation graph; adding new items will require re-training.
One implementation for the first class of models is here; for the second class, check out my recommender system package Spotlight.
I have received tens of thousands of user reviews on the app.
I know the meaning of many of the comments are the same.
I can not read all these comments.
Therefore, I would like to use a python program to analyze all comments,
Identify the most frequently the most important feedback information.
I would like to ask, how can I do that?
I can download an app all comments, also a preliminary understanding of the Google Prediction API.
You can use the Google Prediction API to characterize your comments as important or unimportant. What you'd want to do is manually classify a subset of your comments. Then you upload the manually classified model to Google Cloud Storage and, using the Prediction API, train your model. This step is asynchronous and can take some time. Once the trained model is ready, you can use it to programmatically classify the remaining (and any future) comments.
Note that the more comments you classify manually (i.e. the larger your training set), the more accurate your programmatic classifications will be. Also, you can extend this idea as follows: instead of a binary classification (important/unimportant), you could use grades of importance, e.g. on a 1-5 scale. Of course, that entails more manual labor in constructing your model so the best strategy will be a function of your needs and how much time you can spend building the model.