I wish to repeat a series of image classification experiments by reusing a CNN with the same CNN with identical hyperparameters especially initializations. So, if I save a model after I have instantiated it and before I train it, does that also save the initializations so I then reload it later and train with a different data set and labels, does it start this new model with the same hyperparameters and initializations as the first model I trained with the first data set/classification labels? I am currently using fastai which is, of course, a library/set of API's, built on Pythorch but I think that everyone would be helped with a more general explanation that covers all CNN's using any library.
I expect an answer that says, "after this point in the workflow creating a CNN, the model is initialized and if you save it at this point, you can reload it later and use the same hyperparameters and initializations in your next model."
you can save the learner as soon it is created.
Example:
learn = cnn_learner(data,models.resnet34,metrics=error_rate)
learn.save('init')
later on:
learn.load('init)
Related
I am trying to use machine learning in Python. Right now I am using sklearn and TensorFlow. I was wondering what to do if I have a model that needs updating when new data comes. For example, I have financial data. I built an LSTM model with TensorFlow and trained it. But new data comes in every day, and I don't want to retrain the model every day. Is there a way just to update the model and not retrain it from scratch?
In sklearn, the documentation for .fit() method (using DecisionTreeClassifier as an example) says that it
Build a decision tree classifier from the training set (X, y).
So it seems like it will retrain the entire model from scratch.
In tensorflow, .fit() method (using Sequential as an example) say
Trains the model for a fixed number of epochs (iterations on a
dataset).
So it seems like it does update the model instead of retraining. But I am not sure if my understanding is correct. I would be grateful for some clarification. And if sklearn indeed retrains the entire model using .fit(), is there a function that would just update the model instead of retraining from scratch?
When you say update and not train. Is it just updating the weights using the new data?
If so you can adopt two approaches with Transfer learning.
Finetune: Initialise a model with the weights from old model and retrain it on new data.
Add a new layer: Add a new layer and update the weights in this layer only while freezing the remaining weights in the network.
for more details read the tensorflow guide on tansferlearning
In tensorflow, there is a method called train_on_batch() that you can call on your model.
Say you defined your model as sequential, and you initially trained it on the existing initial_dataset using the fit method.
Now, you have new data in your hand -> call it X_new_train,y_new_train
so you can update the existing model using train_on_batch()
An example would be:
#generate some X_new_train (one batch)
X_new_train = tf.random.normal(shape=[no_of_samples_in_one_batch,100])
#generate corresponding y_new_train
y_new_train = tf.constant([[1.0]]*no_of_samples_in_one_batch)
model.train_on_batch(X_new_train,y_new_train)
Note that the idea of no_of_samples_in_one_batch (also called batch size) is not so important here. I mean whatever number of samples that you have in your data will be considered as one batch!
Now, coming to sklearn, I am not sure whether all machine learning models can incrementally learn (update weights from new examples). There is a list of models that support incremental learning:
https://scikit-learn.org/0.15/modules/scaling_strategies.html#incremental-learning
In sklearn, the .fit() method retrains on the dataset i.e as you use .fit() on any dataset, any info pertaining to previous training will all be discarded. So assuming you have new data coming in every day you will have to retrain each time in the case of most sklearn algorithms.
Although, If you like to retrain the sklearn models instead of training from scratch, some algorithms of sklearn (like SGDClassifier) provide a method called partial_fit(). These can be used to retrain and update the weights of an existing model.
As per Tensorflow, the .fit() method actually trains the model without discarding any info pertaining to previous trainings. Hence each time .fit() is used via TF it will actually retrain the model.
Tip: you can use SaveModel from TF to save the best model and reload and re-train the model as and when more data keeps flowing in.
I trained one custom object detection model for my use case using darknet and yolov4. I mentioned 3 classes in the obj.name file as mentioned below:
# data/obj.names
no_helmet
helmet
vest
The training was completed and detection was also working with good results.
Now, I wanted to add 2 new classes to the model, so I updated the class file with 2 new class names:
# data/obj.names
no_helmet
helmet
vest
fire
smoke
I made changes to the config file and updated classes=3 to classes=5 and filter=24 to filter=30 for all the [yolo] layers and preceding [convolutional] layers.
For the dataset, I only provided images and annotations for the 2 new classes (fire and smoke).
Then I started the darknet training and for the weights parameter, I provided my old yolov4 trained weights. After it was completed, I ran tests and it didn't detect anything in the image. Not even the old classes.
Where did I go wrong?
What I feel is that since I didn't provide a dataset for older classes, the model forgot those. But, then it should have at least detected new classes, right? or am I wrong here?
NEW EDIT:
I trained again with the combined dataset(old 3 classes and 2 new classes) on the pre-trained custom weights (trained for the first 3 classes) and when I ran it for the test, still no output is there.
Could somebody explain to me what's going on here? I'm thinking there is some mathematics behind it which I don't know about.
Do I have to train from scratch every time?
You can't insert new classes in a trained model.
When you trained using only the fire and smoke dataset, your config were not configured for 2 classes, but for 5 instead. Probably it's why you didn't get anything in your second test.
The model does not forget, but uses that weight as a head start for your new model. Just train again with a dataset containing all categories labelled.
From the procedure that YOLO follows in training, any further training will enhance and modify the currently layers, so if you added new classes and retrained, you will enhance the exiting one and create new classes from scratch, which means not all classes will be the same during detection. I think the safest way is to train all from scratch.
I tried image classification using trained model and its working well but some images could not find perfectly in that time have to get that image and label from users so my doubt is..Is it possible to add new data into already trained model?
No, during inference time you use the weights of the trained model for predictions. Which basically means that at the time your model is deployed the capabilities of your image classifier are fixed by the weights. If you wish to improve your model, you would have to retrain your model with the new - data. However, there is another paradigm of learning called "Online Learning" where the model is continuously learning and modifying the weights. In this case your weights are not fixed and your model is continuously updating its weights with each training input. However afaik this is not usually recommended for CNNs, because the backward pass of gradients is computationally intensive and your inference will be slow because of this.
No model can predict with 100% accuracy if it does it's an ideal model. And if you want to add more data to your train model you have to retrain the model with the new data. Having more data is always a good idea. It allows the “data to tell for itself,” instead of relying on assumptions and weak correlations. Presence of more data results in better and accurate models. So if you want to get better accuracy you have to train your model with more data. Without retraining, you can't add data to your trained model.
I'm working on a project that requires the recognition of just people in a video or a live stream from a camera. I'm currently using the tensorflow object recognition API with python, and i've tried different pre-trained models and frozen inference graphs. I want to recognize only people and maybe cars so i don't need my neural network to recognize all 90 classes that come with the frozen inference graphs, based on mobilenet or rcnn, as it seems this slows the process, and 89 of this 90 classes are not needed in my project. Do i have to train my own model or is there a way to modify the inference graphs and the existing models? This is probably a noob question for some of you, but mind that i've worked with tensorflow and machine learning for just one month.
Thanks in advance
Shrinking the last layer to output 1 or two classes is not likely to yield large speed ups. This is because most of the computation is in the intermediate layers. You could shrink the intermediate layers, but this would result in poorer accuracy.
Yes, you have to train own model. Let's see in short words some ways how to do.
OPTION 1. When you want to apply transfer knowledge as maximum as possible, you can froze the CNN layers. After, you change a quantity of detected classes with dimension of classifier (dense layers). The classifier is the latest part in CNN architecture. Now, you should retrain only classifier.
OPTION 2. Assuming, you want to apply transfer knowledge for first layers of CNN (for example, froze first 2-3 CNN layers) and retrain rest of CNN with classifier. After, you change a quantity of detected classes with dimension of classifier. Now, you should retrain rest of CNN layers and classifier.
OPTION 3. Assuming, you want to retrain whole CNN with classifier. After, you change a quantity of detected classes with dimension of classifier. Now, you should retrain whole CNN with classifier.
Generally, the Tensorflow Object Detection API is a good start for beginners! How to proceed with your problem you can see here more detail about whole process and extra explanation here.
I've been using tensorflow for a while now. At first I had stuff like this:
def myModel(training):
with tf.scope_variables('model', reuse=not training):
do model
return model
training_model = myModel(True)
validation_model = myModel(False)
Mostly because I started with some MOOCs that tought me to do that. But they also didn't use TFRecords or Queues. And I didn't know why I was using two separate models. I tried building only one and feeding the data with the feed_dict: everything worked.
Ever since I've been usually using only one model. My inputs are always place_holders and I just input either training or validation data.
Lately, I've noticed some weird behavior on models that use tf.layers.dropout and tf.layers.batch_normalization. Both functions have a 'training' parameter that I use with a tf.bool placeholder. I've seen tf.layers used generally with a tf.estimator.Estimator, but I'm not using it. I've read the Estimators code and it appears to create two different graphs for training and validation. May be that those issues are arising from not having two separate models, but I'm still skeptical.
Is there a clear reason I'm not seeing that implies that two separate-equivalent models have to be used?
You do not have to use two neural nets for training and validation. After all, as you noticed, tensorflow helps you having a monolothical train-and-validate net by allowing the training parameter of some layers to be a placeholder.
However, why wouldn't you? By having separate nets for training and for validation, you set yourself on the right path and future-proof your code. Your training and validation nets might be identical today, but you might later see some benefit to having distinct nets such as having different inputs, different outputs, removing out intermediate layers, etc.
Also, because variables are shared between them, having distinct training and validation nets comes at almost no penalty.
So, keeping a single net is fine; in my experience though, any project other than playful experimentation is likely to implement a distinct validation net at some point, and tensorflow makes it easy to do just that with minimal penalty.
tf.estimator.Estimator classes indeed create a new graph for each invocation and this has been the subject of furious debates, see this issue on GitHub. Their approach is to build the graph from scratch on each train, evaluate and predict invocations and restore the model from the last checkpoint. There are clear downsides of this approach, for example:
A loop that calls train and evaluate will create two new graphs on every iteration.
One can't evaluate while training easily (though there are workarounds, train_and_evaluate, but this doesn't look very nice).
I tend to agree that having the same graph and model for all actions is convenient and I usually go with this solution. But in a lot of cases when using a high-level API like tf.estimator.Estimator, you don't deal with the graph and variables directly, so you shouldn't care how exactly the model is organized.