python keeps crashing whenever CNN model is being trained

python keeps crashing whenever CNN model is being trained - python

I am really looking for your help.
I have GTX-1070 which is 8vram.
I downloaded tensorflow-gpu, cuda 9.0, cudnn 7.0 for cuda 9.0.
and everything works fine with DNN. GPU is also working fine.
but whenever I try to train any model that has to do with image, it crashes.
Currently I am working with keras pre-trained VGG16.
I tried using smaller batch-size, resized image down to 64x64.
When I look at the process, GPU is used 0%, then spikes up to 100% then crashes.
Spyder says "kernel died, restarting".
Is gtx-1070 really that short of memory or am I missing something?
Thanks for reading

The first thing I would try is to download Cudnn 7.1.
These are good instructions to follow, and you may consider reinstalling Cuda 9 again. I had to do the same at one point, it was frustrating but haven't had a problem since I got it right.
Installation Instructions

I had a similar crashing problem before. The cause was my cudnn7.1 and tensorflow-gpu (precompiled with cudnn7.05) versions mismatched. Once taken care, there is no more problem.

Related

TensorFlow does not recognize gpu. It recognizes only one thing

enter image description here
All gpu versions are the same.
enter image description here
But tensorflow recognizes only No.1 gpu.
I want to know why this is happening.
I use TensorFlow 2.5 version
We've looked at it in many ways, but we've found similar cases.

Could not load library cudnn_cnn_infer64_8.dll. Error code 126 Please make sure cudnn_cnn_infer64_8.dll is in your library path a

I want to train the model to recognize various techniques, I make a model in darknet yolov4, put it on Windows 10, did everything, and VS 2019 put, and CMake, OpenCV, Cuda and CUDNN. Put Cuda and CUDNN in the environment variables and in the Path. As a result, when I'm going to put a model for training, this error appearsenter image description here

It happened to me on yolov4. Somehow, cuddn 8.30 is not compatible. I installed cudnn 8.2.2 and my problem was solved.
cudnn-11.4-windows-x64-v8.2.2.26
Download it from the Nvidia developer website.

Download only specific part of Tensorflow Library

I have a Deep Learning Code for Object Detection. What I did is that I ran the code on Google Colab and then Exported the model to use it locally. Now to run the model I have to again install whole Tensorflow package which is quite heavy for my system.
I want to ask if there is a way to download and run only specific parts of Tensorflow Library?
I am using Tensorflow at only 2 places in my code and I have to install whole Tensorflow library for it.
This is where I am loading the model.
detect_fn = tf.saved_model.load(PATH_TO_SAVED_MODEL)
This is where I am using Tensorflow 2nd time.
input_tensor = tf.convert_to_tensor(image_rgb)
These are the only 2 functions required to me from the Tensorflow Library and not the whole library... Thanks in anticipation.

Though I'm not entirely sure on the library as a whole, there is a Lite version of Tensorflow (I guess they realised 430MB is a bit much too).
Information regarding this can be found here:
https://www.tensorflow.org/lite/
A guide here seems to detail how to pick and choose parts of the Lite library and although not used myself, I should expect some degree of compatibility between the two...
https://www.tensorflow.org/lite/guide/reduce_binary_size

'Paging file too small for this operation to complete' Error when attempting to train YOLOv5 object detection model

I have ~50000 images and annotation files for training a YOLOv5 object detection model. I've trained a model no problem using just CPU on another computer, but it takes too long, so I need GPU training. My problem is, when I try to train with a GPU I keep getting this error:
OSError: [WinError 1455] The paging file is too small for this operation to complete
This is the command I'm executing:
train.py --img 640 --batch 4 --epochs 100 --data myyaml.yaml --weights yolov5l.pt
CUDA and PyTorch have successfully been installed and are available. The following command installed with no errors:
pip3 install torch==1.10.0+cu113 torchvision==0.11.1+cu113 torchaudio===0.10.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
I've found other people online with similar issues and have fixed it by changing the num_workers = 8 to num_workers = 1. When I tried this, training started and seemed to get past the point where the paging file is too small error appears, but then crashes a couple hours later. I've also increased the virtual memory available on my GPU as per this video (https://www.youtube.com/watch?v=Oh6dga-Oy10) that also didn't work. I think it's a memory issue because some of the times it crashes I get a low memory warning from my computer.
Any help would be much appreciated.

So I've managed to fix my specific problem and thought posting the answer here might help someone else. Basically, I don't think I had enough RAM. I was using 8 GB before and I've upgraded to 32GB and it's working fine.
As I wrote in the question above, I thought it was a memory issue and I got it to work on another computer only using CPU. I also noticed that when training started there was a spike in RAM usage. This guy also explains the importance of RAM when training deep learning models on large datasets:
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/
Hope this can help other people with the same issue.

Is there a way to use a compiled keras model on the RPI Zero?

I am working on a Letter Recognition Application for a robot. I used my home PC for training the model and wanted the recognition to be on the RPI Zero W with the already trained model.
I got an HDF model. When I try to install Tensorflow on the RPI zero, it's throwing a hash error, as far as I found it this is due to TF beeing for 64bit machines. When I try to install Tensorflow Lite, the installation stocks and crashes.
For saving the model I use:
classifier.save('test2.h5')
That are the Prediction lines:
test_image = ks.preprocessing.image.load_img('image.jpg')
test_image = ks.preprocessing.image.img_to_array(test_image)
result = classifier.predict(test_image)
I also tried to compile the python script via Nuitka, but as the RPI is ARM and nuitka is not offering cross-compile, this possibility felt out.

You can use already available TFLite to solve your issue.
If that does not help, you can also build TFLite from source.
Please refer to below links:
https://www.tensorflow.org/lite/guide/build_rpi
https://medium.com/#haraldfernengel/compiling-tensorflow-lite-for-a-raspberry-pi-786b1b98e646

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

python keeps crashing whenever CNN model is being trained - python

The first thing I would try is to download Cudnn 7.1. These are good instructions to follow, and you may consider reinstalling Cuda 9 again. I had to do the same at one point, it was frustrating but haven't had a problem since I got it right. Installation Instructions

I had a similar crashing problem before. The cause was my cudnn7.1 and tensorflow-gpu (precompiled with cudnn7.05) versions mismatched. Once taken care, there is no more problem.

Related

TensorFlow does not recognize gpu. It recognizes only one thing

Could not load library cudnn_cnn_infer64_8.dll. Error code 126 Please make sure cudnn_cnn_infer64_8.dll is in your library path a

Download only specific part of Tensorflow Library

'Paging file too small for this operation to complete' Error when attempting to train YOLOv5 object detection model

Is there a way to use a compiled keras model on the RPI Zero?

Categories

Resources