I am training a machine learning model with my own dataset generated from another program. I have designed the model to work on 128x128 images however as I approach 100'000 images I start to run into issues with the training crashing without any informative output (the kernel dies). I am assuming that this is caused by memory limits since it only occurs as the number of images increases.
To mitigate the memory usage I realized that all of the pixels in the input image are either 0 or 255 meaning that after normalization they are 0 or 1. Is there a way to use this phenomenon in PyTorch to reduce memory usage? Or are there some other benefits you can use when the input image only contains binary values?
Related
I am running the GPT-2 code of the large model(774M). It is used for the generation of text samples through interactive_conditional_samples.py , link: here
So I've given an input file containing prompts which are automatically selected to generate output. This output is also automatically copied into a file. In short, I'm not training it, I'm using the model to generate text.
Also, I'm using a single GPU.
The problem I'm facing in this is, The code is not utilizing the GPU fully.
By using nvidia-smi command, I was able to see the below image
https://imgur.com/CqANNdB
It depends on your application. It is not unusual to have low GPU utilization when the batch_size is small. Try increasing the batch_size for more GPU utilization.
In your case, you have set batch_size=1 in your program. Increase the batch_size to a larger number and verify the GPU utilization.
Let me explain using MNIST size networks. They are tiny and it's hard to achieve high GPU (or CPU) efficiency for them. You will get higher computational efficiency with larger batch size, meaning you can process more examples per second, but you will also get lower statistical efficiency, meaning you need to process more examples total to get to target accuracy. So it's a trade-off. For tiny character models, the statistical efficiency drops off very quickly after a batch_size=100, so it's probably not worth trying to grow the batch size for training. For inference, you should use the largest batch size you can.
Hope this answers your question. Happy Learning.
I tried training an AutoEnsembleEstimator with two DNNEstimators (with hidden units of 1000,500, 100) on a dataset with around 1850 features (after feature engineering), and I kept running out of memory (even on larger 400G+ high-mem gcp vms).
I'm using the above for binary classification. Initially I had trained various models and combined them by training a traditional ensemble classifier over the trained models. I was hoping that Adanet would simplify the generated model graph that would make the inference easier, rather than having separate graphs/pickles for various scalers/scikit models/keras models.
Three hypotheses:
You might have too many DNNs in your ensemble, which can happen if max_iteration_steps is too small and max_iterations is not set (both of those are constructor arguments to AutoEnsembleEstimator). If you want to train each DNN for N steps, and you want an ensemble with 2 DNNs, you should set max_iteration_steps=N, set max_iterations=2, and train the AutoEnsembleEstimator for 2N steps.
You might have been on adanet-0.6.0-dev, which had a memory leak. To fix this, try updating to the latest release and seeing if this problem still arises.
Your batch size might have been too large. Try lowering your batch size.
I would like to train a new model using my own dataset. I will be
using Darkflow/Tensorflow for it.
Regarding my doubts:
(1) Should we resize our training images for a specific size?
(2) I think smaller images might save time, but can smaller images harm the accuracy?
(3) And what about the images to be predicted, should we resize them as well or is it not necessary?
(1) It already resize it with random=1 in .cfg file.The answer is "yes".The input resolution of images are same.You can resize it by yourself or Yolo can do it.
(2)If your hardware is good enough,I suggest you to use big sized images.Also as a suggest,If you will use webcam,use images as the same resolutions as your webcam uses.
(3)Yes, same as training.
(1) Yes, neural networks have fixed input dimensions. These can be adjusted to fit your purpose, but at last you need to commit to a defined input dimension, and thus you need to input your images fitting these dimensions. For YOLO I found the following:
layer filters size input output
0 conv 32 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 32
It could be that the framework you are using already does that step for you. Maybe somebody could comment on that.
(3) The images / samples you feed during inference, for prediction should be as similar to the training images / samples as possible. So whatever preprocessing you re doing with your training data, you should definitely do the same on your inference data.
(2) Smaller images make sense if your hardware is not able to hold larger images in memory, or if you train with large batch sizes so that your hardware needs to hold multiple images in memory at ones. In the end, the computational time is rather proportional to the amount of operations of your architecture, not necessarily to the images size.
(1) No, it is not necessary. But if your dataset contains random resolutions, you can put
random = 1
in your .cfg file for better results.
(2) Smaller images don't reduce the time to converge, but if your dataset contains only small images, Yolo will probably fail to converge (Yolov3 is not a good detector for a lot of tiny objects)
(3) It is not necessary
In my face recognition project a face is represented as a 128-dimensional embedding(face_descriptor) as used in FaceNet.
I could generate embedding from image in 2 ways.
Using Tensorflow resnet model v1.
emb_array = sess.run(embedding_layer,
{images_placeholder: images_array, phase_train_placeholder: False})
An array of images can be passed and a list of embeddings is obtained.
This is a bit slow.Took 1.6s.(Though the time is almost constant for large number of images).
Note: GPU not available
Other method is using dlib
dlib.face_recognition_model_v1.compute_face_descriptor(image, shape)
This gives fast result. Almost 0.05 seconds.
But only one image can be passed at a time.Time increases with number of images.
Is there any way to pass array of images to calculate embeddings in dlib or any way to improve the speed in dlib?
Or is there any other faster method to generate 128 dimensional face embedding?
Update:
I concatenated multiple images into single image and passed to dlib
dlib.face_recognition_model_v1.compute_face_descriptor(big_image, shapes)
i.e converted multiple images with single face into single image with multiple faces.
Still time is proportional to number of images(i.e number of faces) concatenated. Almost same time for iterating on individual images.
One of the more important aspects to this question is that you have no GPU available. I'm putting this here so if anyone reads this answer will have a better understanding of the context.
There are two major parts to the time consumed for inference. First is the setup time. Tensorflow takes its sweet, sweet time to set itself up when you first run the network, therefore your measurement of 1.6 seconds is probably 99.9999% setup time and 0.0001% processing your image. Then it does the actual inference calculation, which is probably tiny for one image compared to the setup. A better measurement would be running 1,000 images through it and then 2,000 images and calculate the difference, divided by 1,000 to get how much time each image takes to infer.
From the look of it, Dlib doesn't spend much time with setting up on the first run, but it would still be a better benchmark to do the same as outlined above.
I suspect Tensorflow and Dlib should be fairly similar in terms of execution speed on a CPU because both use optimized linear algebra libraries (BLAS, LAPACK) and there is only so much optimization one can do for matrix multiplication.
There is another thing you might want to give a try though. Most networks use 32 bit floating point calculations for training and inference, but research shows that in most cases, switching over to 8 bit integers for inference doesn't degrade accuracy too much but speeds up inference by a lot.
It is generally better to train a network with later quantization in mind at training, which is not the case here because you use a pre-trained model, but you can still benefit from quantization a lot probably. You can quantize your model with basically running a command that's included in Tensorflow (with the surprising name quantize_graph) but there is a little bit more to it. There is a nice quantization tutorial to follow, but keep in mind that the script is now in tensorflow/tools/quantization and not in contrib any more, as written in the tutorial.
I am trying to perform image segmentation using machine learning (SVM in particular). I am segmenting MRIs and the original images are 512x512x100. I have created 78 features per image. At that image size and number of features I quickly run out of memory.
To resolve the memory issue I have done a couple of things. 1) I down sampled the images to 256x256x50. I also reduced the precision to 16bit float as the original image is 16bit and so I didn't believe it necessary to have more precise data than that. (Maybe I'm wrong here.)
So. I was able to reduce the memory of my data to an amount that can be held in memory 6GB. Until I went to actually use the SVM function in sklearn and my computer quickly started using swap memory as it had run out of ram (16gb). I went searching a bit and found on the sklearn docs (http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html) that "If X and y are not C-ordered and contiguous arrays of np.float64 and X is not a scipy.sparse.csr_matrix, X and/or y may be copied." This along with other posts on github made me realize the data was being scaled up to flaot64 and therefore taking up all my memory, as going from 16 to 64 from what I gather would increase the ram from 6 to 24gb... which goes beyond what I have available.
Here is a simple example of the code. Features is a bumpy array of 39,321,600 (256*256*50*12[training images]) by 78 (the features) and segmentations is 39,321,600 by 1 with values between 0-6 for the various regions of interest.
from sklearn import svm
clf = svm.SVC()
clf.fit(features, segmentations)
Above is the only code that is relevant at this point as I haven't gotten past the training portion.
Any help with either training a dataset of this size using SVM and sklearn, or any other options would be greatly appreciated.
Thanks.
Anthony.
PS. I have performed a subsampling of the data as an option. Though, this is not ideal as I would like to use the whole image. If this is my best bet I guess I will pursue.