It seems like the Predict SignatureDef encompasses all the functionality of the Classification and Regression SignatureDefs. When would there be an advantage to using Classification or Regression SignatureDefs rather than just using Predict for everything? We're looking to keep complexity down in our production environment, and if it's possible to use just Predict SignatureDefs in all cases, that would seem like a good idea.
From what I can see on the documentation (https://www.tensorflow.org/serving/signature_defs) it seems the "Classify" and "Regress" SigDefs try to force a simple and consistent interface for the simple cases (classify and regress), respectively, "inputs"-->"classes+scores" and "inputs"->"outputs". There seems to be an added benefit that the "Classify" and "Regress" SigDefs dont require a serving function to be constructed as part of the model export function.
Also from the docs, it seems the Predict SigDef allows a more generic interface with the benefit of being able to swap in and out models. From the docs:
Predict SignatureDefs enable portability across models. This means
that you can swap in different SavedModels, possibly with different
underlying Tensor names (e.g. instead of x:0 perhaps you have a new
alternate model with a Tensor z:0), while your clients can stay online
continuously querying the old and new versions of this model without
client-side changes.
Predict SignatureDefs also allow you to add optional additional
Tensors to the outputs, that you can explicitly query. Let's say that
in addition to the output key below of scores, you also wanted to
fetch a pooling layer for debugging or other purposes.
However the docs dont explain, aside from the minor benefit of not having to export a serving function, why one wouldn't just use the Predict SigDef for everything since it appears to be a superset with plenty of upside. I'd love to see a definitive answer on this, as the benefits of the specialized functions (classify, regress) seem quite minimal.
The differences I've seen so far are...
1) If utilizing the tf.feature_column.indicator_column wrapping the tf.feature_column.categorical_column_with_vocabulary_* in a DNNClassifier model, when you query the tensorflow server, I've had problems with the Predict API sometimes not being able to parse/map string inputs according to the vocabulary file/list. On the other hand, the Classify API properly mapped strings to their index (categorical_column) on the vocabulary, and then to the one-hot/multi-hot (indicator_column), and provided (what seems to be) the correct classification response to the query.
2) The response format of [[class, score],[class,score],....] for Classify API vs [class[], score[]] for Predict API. One or the other may be preferable if you need to parse the data in some way afterwards.
TLDR; With indicator_column wrapped in categorical_column_with_vocabulary_*, I've experienced issues with the vocabulary mapping when serving with Predict API. So, using Classify API.
Related
I know that it is possible, for example using TensorFlow but also in PyTorch or whatever, to store an instance of a trained (or in training) model in a way that it can be loaded in future, or loaded by another machine, or just to use it as a checkpoint during the training.
What I wonder is if there is any way, such as the above mentioned one, to store the difference (maybe not exactly the algebric subtraction but a similar concept, always referring to operation on tensors) between two instances of the same neural network (same architecture, different weights) for efficiency purposes.
If you are wondering why this should be convenient, consider an hypothetical setting where there are different entities and all of them know a model instance (a "shared model"), so using the "difference" calculated with respect to this shared model could be useful in terms of storage space or in terms of bandwidth (if the local model parameters should be sent via Internet to another machine).
The hypotesis is that it is possible to reconstruct a model knowing the shared model and the "difference" with the model to reconstruct.
Summarizing my questions:
There is any built-in features in TensorFlow, Pytorch, etc.. to do this?
It could be convenient in your opinion to do something like that? If not, why?
PS: In literature, this concept exists and it has been recently explored within the "Federated Learning" topic, and the "difference" I mentioned is called update.
I see many examples with either MonitoredTrainingSession or tf.Estimator as the training framework. However it's not clear why I would use one over the other. Both are configurable with SessionRunHooks. Both integrate with tf.data.Dataset iterators and can feed training/val datasets. I'm not sure what the benefits of one setup would be.
Short answer is that MonitoredTrainingSession allows user to access Graph and Session objects, and training loop, while Estimator hides the details of graphs and sessions from the user, and generally, makes it easier to run training, especially, with train_and_evaluate, if you need to evaluate periodically.
MonitoredTrainingSession different from plain tf.Session() in a way that it handles variables initialization, setting up file writers and also incorporates functionality for distributed training.
Estimator API, on the other hand, is a high-level construct just like Keras. It's maybe used less in the examples because it was introduced later. It also allows to distribute training/evaluation with DistibutedStrategy, and it has several canned estimators which allow rapid prototyping.
In terms of model definition they are pretty equal, both allow to use either keras.layers, or define completely custom model from the ground up. So, if, for whatever reason, you need to access graph construction or customize training loop, use MonitoredTrainingSession. If you just want to define model, train it, run validation and prediction without additional complexity and boilerplate code, use Estimator
I've been using tensorflow for a while now. At first I had stuff like this:
def myModel(training):
with tf.scope_variables('model', reuse=not training):
do model
return model
training_model = myModel(True)
validation_model = myModel(False)
Mostly because I started with some MOOCs that tought me to do that. But they also didn't use TFRecords or Queues. And I didn't know why I was using two separate models. I tried building only one and feeding the data with the feed_dict: everything worked.
Ever since I've been usually using only one model. My inputs are always place_holders and I just input either training or validation data.
Lately, I've noticed some weird behavior on models that use tf.layers.dropout and tf.layers.batch_normalization. Both functions have a 'training' parameter that I use with a tf.bool placeholder. I've seen tf.layers used generally with a tf.estimator.Estimator, but I'm not using it. I've read the Estimators code and it appears to create two different graphs for training and validation. May be that those issues are arising from not having two separate models, but I'm still skeptical.
Is there a clear reason I'm not seeing that implies that two separate-equivalent models have to be used?
You do not have to use two neural nets for training and validation. After all, as you noticed, tensorflow helps you having a monolothical train-and-validate net by allowing the training parameter of some layers to be a placeholder.
However, why wouldn't you? By having separate nets for training and for validation, you set yourself on the right path and future-proof your code. Your training and validation nets might be identical today, but you might later see some benefit to having distinct nets such as having different inputs, different outputs, removing out intermediate layers, etc.
Also, because variables are shared between them, having distinct training and validation nets comes at almost no penalty.
So, keeping a single net is fine; in my experience though, any project other than playful experimentation is likely to implement a distinct validation net at some point, and tensorflow makes it easy to do just that with minimal penalty.
tf.estimator.Estimator classes indeed create a new graph for each invocation and this has been the subject of furious debates, see this issue on GitHub. Their approach is to build the graph from scratch on each train, evaluate and predict invocations and restore the model from the last checkpoint. There are clear downsides of this approach, for example:
A loop that calls train and evaluate will create two new graphs on every iteration.
One can't evaluate while training easily (though there are workarounds, train_and_evaluate, but this doesn't look very nice).
I tend to agree that having the same graph and model for all actions is convenient and I usually go with this solution. But in a lot of cases when using a high-level API like tf.estimator.Estimator, you don't deal with the graph and variables directly, so you shouldn't care how exactly the model is organized.
I have two different corpus and what i want is to train the model with both and to do it it I thought that it could be something like this:
model.build_vocab(sentencesCorpus1)
model.build_vocab(sentencesCorpus2)
Would it be right?
No: each time you call build_vocab(corpus), like that, it creates a fresh vocabulary from scratch – discarding any prior vocabulary.
You can provide an optional argument to build_vocab(), update=True, which tries to add to the existing vocabulary. However:
it wasn't designed/tested with Doc2Vec in mind, and as of right now (February 2018), using it with Doc2Vec is unlikely to work and often causes memory-fault crashes. (See https://github.com/RaRe-Technologies/gensim/issues/1019.)
it's still best to train() with all available data together - any sort of multiple-calls to train(), with differing data subsets each time, introduces other murky tradeoffs in model quality/correctness that are easy to get wrong. (And, when calling train(), be sure to provide correct values for its required parameters – the practices shown in most online examples are typically only correct for the case where build_vocab() was called once, with exactly the same texts as later calling train().)
I have received tens of thousands of user reviews on the app.
I know the meaning of many of the comments are the same.
I can not read all these comments.
Therefore, I would like to use a python program to analyze all comments,
Identify the most frequently the most important feedback information.
I would like to ask, how can I do that?
I can download an app all comments, also a preliminary understanding of the Google Prediction API.
You can use the Google Prediction API to characterize your comments as important or unimportant. What you'd want to do is manually classify a subset of your comments. Then you upload the manually classified model to Google Cloud Storage and, using the Prediction API, train your model. This step is asynchronous and can take some time. Once the trained model is ready, you can use it to programmatically classify the remaining (and any future) comments.
Note that the more comments you classify manually (i.e. the larger your training set), the more accurate your programmatic classifications will be. Also, you can extend this idea as follows: instead of a binary classification (important/unimportant), you could use grades of importance, e.g. on a 1-5 scale. Of course, that entails more manual labor in constructing your model so the best strategy will be a function of your needs and how much time you can spend building the model.