I am training a deep learning model using Mask RCNN from the following git repository: matterport/Mask_RCNN. I rely on a heavy augmentation of my dataset (original dataset: 59 images of 1988x1355x3 with each > 80 annotations), which I store locally (necessary to evaluate type/degree of augmentation vs validation metrics). The augmented dataset counts 6000 images. This dataset varies in x and y dimensions of the image, because of reducing resolution and affine transformations - I assume the different x,y-dimensions will not affect the final tests.
However, my Python kernel crashes whenever I load more than 'X' images to train the model.
Hence, I came up with the idea of splitting the dataset in sub-datasets and iterate through the sub-dataset, using the 'last' trained weights as starting point for the new round. But I am not sure if the results will be the same (read: same, taken the stochastic nature of 'stochastic gradient descent' into account)?
I wonder, if the results would be the same, if I don't iterate through the sub-datasets per epoch, but train Y epochs (eg. 20 for 'heads' only, 10 for 'all layers')?
Yet, I am sure this is not the most efficient way of solving this issues. Ideas for improvement are welcome.
Note, I am not using keras.preprocessing.image.ImageDataGenerator(), as I have understood it, it randomly generates data and feeds it to the model by replacing the input for the epoch, whereas I would like to feed the whole dataset to the model.
I came up with the idea of splitting the dataset in sub-datasets and iterate through the sub-dataset, using the 'last' trained weights as starting point for the new round. But I am not sure if the results will be the same?
You are doing the same thing as ImageDataGenerator is doing (creating your own mini-batches) but less optimally. The result will be same with respect to what?
If you mean with respect to a model that was trained with all the data in a single batch? - Most probably not. As a lower batch means slower convergence. But this can be solved by training for more epochs.
Another issue is reproducibility. If you want to reproduce your model with same results each time just use seeds.
import random
random.seed(1)
import numpy as np
np.random.seed(1)
import tensorflow
tensorflow.random.set_seed(1)
Another concept is gradient accumulation. It will help you train you with high batch size without keeping too many images in memory at a time.
https://github.com/CyberZHG/keras-gradient-accumulation
Finally, keras.preprocessing.image.ImageDataGenerator() in facts trains on the whole dataset, it just chooses a random sample at each step (you're doing the same thing with your so-called sub-datasets).
You can seed the ImageDataGenerator so it is reproducible and not entirely random.
Related
I am working on a task of generating synthetic data to help the training of my model. This means that the training is performed on synthetic + real data, and tested on real data.
I was told that batch normalization layers might be trying to find weights that are good for all while training, which is a problem since the distribution of my synthetic data is not exactly equal to the distribution of the real data. So, the idea would be to have different ‘copies’ of the weights of batch normalization layers. So that the neural network estimates different weights for synthetic and real data, and uses just the weights of real data for evaluation.
Could someone suggest me good ways to actually implement that in pytorch? My idea was the following, after each epoch of training in a dataset I would go through all batchnorm layers and save their weights. Then at the beginning of the next epoch I would iterate again loading the right weights. Is this a good approach? Still, I am not sure how I should deal with the batch-norm weights at test time since batch-norm treats it differently.
It sounds like the problem you're worried about is that your neural network will learn weights that work well when the batch norm is computed for a batch of both real and synthetic data, and then later at test time it will compute a batch norm on just real data?
Rather than trying to track multiple batch norms, you probably just want to set track_running_stats to True for your batch norm layer, and then put it into eval mode when testing. This will cause it to compute a running mean and variance over multiple batches while training, and then it will use that mean and variance later at test time, rather than looking at the batch stats for the test batches.
(This is often what you want anyway, because depending on your use case, you might be sending very small batches to the deployed model, and so you want to use a pre-computed mean and variance rather than relying on stats for those small batches.)
If you really want to be computing fresh means and variances at test time, what I would do is instead of passing a single batch with both real and synthetic data into your network, I'd pass in one batch of real data, then one batch of synthetic data, and average the two losses together before backprop. (Note that if you do this you should not rely on the running mean and variance later -- you'll have to either set track_running_stats to False, or reset it when you're done and run through a few dummy batches with only real data to compute reasonable values. This is because the running mean and variance stats are only useful if they're expected to be roughly the same for every batch, and you're instead polarizing the values by feeding in different types of data in different batches.)
I'm currently creating a model and while creating it I came with some questions. Does training the same model with the same data multiple times leads to better precision of those objects, since your training it every time? And what could be the issue when sometimes the object gets 90% precision and when I re-run it gets lower precision or even not predicting the right object? Is it because of Tensorflow running on the GPU?
I will guess that you are doing image recognition and that you want to identify images (objects) using a neuronal network made with Keras. You should train it once, but during training you will do several epochs, meaning the algorithm adapts the weights in several rounds (epochs). For each round it goes over all training images. Once trained, you can use the model to identify images/objects.
You can evaluate the accuracy of your trained model over the same training set, but it is better to use a different set (see train_test_split from sklearn for instance).
Training is a stochastic process, meaning that every time you train your network it will be different in the end. Hence, you will get different accurcies. The stochasticity comes from different initial weights or from using stochastic gradient descent methods for instance.
The question does not appear to have anything to do with Keras or TensorFlow but basic understandting of how neuronal networks work. There is no connection to running Tensorflow on the GPU. You will also not get better precision by training with the same objects. If you train your model on a dataset for a very long time (many epochs), you might get into overfitting. Then, the accuracy of your model on this training dataset will be very high, but the model will have low accuracy on other datasets.
A common technique is split your date in train and validation datasets, then repeatedly train your model using EarlyStopping. This will train on the training dataset, then calculate the loss against the validation dataset, and then keep training until no further improvement is seen. You can set a patience parameter to wait for X epochs without an improvement to stop training (and optionally save the best model)
https://machinelearningmastery.com/how-to-stop-training-deep-neural-networks-at-the-right-time-using-early-stopping/
Another trick is image augmentation with ImageDataGenerator which will generate synthetic data for you (rotations, shifts, mirror images, brightness adjusts, noise etc). This can effectively increase the amount of data you have to train with, thus reducing overfitting.
https://machinelearningmastery.com/how-to-configure-image-data-augmentation-when-training-deep-learning-neural-networks/
I tried training an AutoEnsembleEstimator with two DNNEstimators (with hidden units of 1000,500, 100) on a dataset with around 1850 features (after feature engineering), and I kept running out of memory (even on larger 400G+ high-mem gcp vms).
I'm using the above for binary classification. Initially I had trained various models and combined them by training a traditional ensemble classifier over the trained models. I was hoping that Adanet would simplify the generated model graph that would make the inference easier, rather than having separate graphs/pickles for various scalers/scikit models/keras models.
Three hypotheses:
You might have too many DNNs in your ensemble, which can happen if max_iteration_steps is too small and max_iterations is not set (both of those are constructor arguments to AutoEnsembleEstimator). If you want to train each DNN for N steps, and you want an ensemble with 2 DNNs, you should set max_iteration_steps=N, set max_iterations=2, and train the AutoEnsembleEstimator for 2N steps.
You might have been on adanet-0.6.0-dev, which had a memory leak. To fix this, try updating to the latest release and seeing if this problem still arises.
Your batch size might have been too large. Try lowering your batch size.
I am training a neural network model, and my model fits the training data well. The training loss decreases stably. Everything works fine. However, when I output the weights of my model, I found that it didn't change too much since random initialization (I didn't use any pretrained weights. All weights are initialized by default in PyTorch). All dimension of the weights only changed about 1%, while the accuracy on training data climbed from 50% to 90%.
What could account for this phenomenon? Is the dimension of weights too high and I need to reduce the size of my model? Or is there any other possible explanations?
I understand this is a quite broad question, but I think it's impractical for me to show my model and analyze it mathematically here. So I just want to know what could be the general / common cause for this problem.
There are almost always many local optimal points in a problem so one thing you can't say specially in high dimensional feature spaces is which optimal point your model parameters will fit into. one important point here is that for every set of weights that you are computing for your model to find a optimal point, because of real value weights, there are infinite set of weights for that optimal point, the proportion of weights to each other is the only thing that matters, because you are trying to minimize the cost, not finding a unique set of weights with loss of 0 for every sample. every time you train you may get different result based on initial weights. when weights change very closely with almost same ratio to each others this means your features are highly correlated(i.e. redundant) and since you are getting very high accuracy just with a little bit of change in weights, only thing i can think of is that your data set classes are far away from each other. try to remove features one at a time, train and see results if accuracy was good continue to remove another one till you hopefully reach to a 3 or 2 dimensional space which you can plot your data and visualize it to see how data points are distributed and make some sense out of this.
EDIT: Better approach is to use PCA for dimensionality reduction instead of removing one by one
I am using Sci-Kit learn's svm library for classifying images. I was wondering when I fit the testing data does it work sequentially or does it erase the previous classification material and re-fit to the new testing data. For example if I fit 100 images to the classifier can I go ahead and then sequentially fit another 100 images or will the SVM delete the work it performed on the original 100 images. This is difficult to explain for me so I'll provide and example:
In order to fit a SVM classifier to 200 images can I do this:
clf=SVC(kernel='linear')
clf.fit(test.data[0:100], test.target[0:100])
clf.fit(test.data[100:200], test.target[100:200])
Or must I do this:
clf=SVC(kernel='linear')
clf.fit(test.data[:200], test.target[:200])
I am wondering only because I run into memory errors when trying to use .fit(X, y) with too many images at once. So is it possible to use fit sequentially and "increment" my classifier upwards so that it is techincally trained on 10000 images but only 100 at a time.
If this is possible please confirm and explain? And if its not possible please explain?
http://scikit-learn.org/stable/developers/index.html#estimated-attributes
The last-mentioned attributes are expected to be overridden when you
call fit a second time without taking any previous value into account:
fit should be idempotent.
https://en.wikipedia.org/wiki/Idempotent
So yes, second call will erase old model and compute new one. You can check it by yourself if you understand python code. For example in sklearn/svm/classes.py
I think you need minibatch training, but i don't see partial_fit implementation for SVM, maybe it's because scikit-learn team recommend SGDClassifier and SGDRegressor for dataset with size more than 100k samples. http://scikit-learn.org/stable/tutorial/machine_learning_map/, try to use them with minibatch as described here.