training YOLOv7 on CPU provides CUDA error - python

I am trying to run train a yolov7 model without a gpu. This is currently the command line that I am using on colab.
python train_aux.py --workers 1 --device cpu --batch-size 1 --data data/coco.yaml --img 128 128 --cfg /content/yolov7/cfg/training/yolov7-e6e.yaml --weights '' --name yolov7-e6e --hypdata/hyp.scratch.p6.yaml`
For some reason I first get an warning
warnings.warn('User provided device_type of \'cuda\', but CUDA is not available. Disabling')
and then I get the error
RuntimeError: No CUDA GPUs are available
during the first epoch. I dont understand why it is trying to use cuda when I am running it on CPU. Am I missing some spot that I have to edit in the code to fix this? Here is the link to the github that I am using
I have tried to download the cuda library incase that helped using.
!pip install cuda-python
but it didnt solve the issue.

So it looks like this issue is due to cuda being hard coded into the model for certain procedures. A more in-depth explanation can be found here link.In the meantime removing the --device cpu for some reason fixed it.

Related

MemoryError Precedes BrokenPipeError While Training CNN

Generally for running pip with no cache we use --no-cache-dir, like
pip install pytorch --no-cache-dir.
I downloaded a CNN model I want to use from github.
The first two lines of execution
python generate_dataset.py --is_train=True --use_phase=True --chip_size=100 --patch_size=94 --use_phase=True --dataset=soc
python generate_dataset.py --is_train=False --use_phase=True --chip_size=128 --patch_size=128 --use_phase=True --dataset=soc
executed succesfully. But while running
python train.py --config_name=config/AConvNet-SOC.json
It is giving MemoryError.
The publisher of above repository is using 32GB RAM and 11 GB GPU. But I have 8 GB RAM and 8GB GPU.
Here is what I have done:
I thought of running it without cache. like,
python train.py --config_name=config/AConvNet-SOC.json --no-cache-dir
But it is throwing below error
FATAL Flags parsing error: Unknown command line flag 'no-cache-dir' Pass --helpshort or --helpfull to see help on flags.
I think it is because no-cache-dir argument is not defined in it by using absl.flags. Does python supports using no chache directory implementation
I am able to solve it by decreasing the number of epochs and batch_size. But I want to run it for full epochs.
Using zeo_grad() of Pytorch makes the gradients zero for every minibatch, so that GPU won't run out of memory. But it is already used in the code I am running in _base.py. Is there anyway I can leverage more of this.
How to resolve this.

'Paging file too small for this operation to complete' Error when attempting to train YOLOv5 object detection model

I have ~50000 images and annotation files for training a YOLOv5 object detection model. I've trained a model no problem using just CPU on another computer, but it takes too long, so I need GPU training. My problem is, when I try to train with a GPU I keep getting this error:
OSError: [WinError 1455] The paging file is too small for this operation to complete
This is the command I'm executing:
train.py --img 640 --batch 4 --epochs 100 --data myyaml.yaml --weights yolov5l.pt
CUDA and PyTorch have successfully been installed and are available. The following command installed with no errors:
pip3 install torch==1.10.0+cu113 torchvision==0.11.1+cu113 torchaudio===0.10.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
I've found other people online with similar issues and have fixed it by changing the num_workers = 8 to num_workers = 1. When I tried this, training started and seemed to get past the point where the paging file is too small error appears, but then crashes a couple hours later. I've also increased the virtual memory available on my GPU as per this video (https://www.youtube.com/watch?v=Oh6dga-Oy10) that also didn't work. I think it's a memory issue because some of the times it crashes I get a low memory warning from my computer.
Any help would be much appreciated.
So I've managed to fix my specific problem and thought posting the answer here might help someone else. Basically, I don't think I had enough RAM. I was using 8 GB before and I've upgraded to 32GB and it's working fine.
As I wrote in the question above, I thought it was a memory issue and I got it to work on another computer only using CPU. I also noticed that when training started there was a spike in RAM usage. This guy also explains the importance of RAM when training deep learning models on large datasets:
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/
Hope this can help other people with the same issue.

Converted ONNX model runs on CPU but not on GPU

I converted a TensorFlow Model to ONNX using this command:
python -m tf2onnx.convert --saved-model tensorflow-model-path --opset 10 --output model.onnx
The conversion was successful and I can inference on the CPU after installing onnxruntime.
But when I create a new environment, install onnxruntime-gpu on it and inference using GPU, I get different error messages based on the model. E.g. for MobileNet I receive W:onnxruntime:Default, cuda_execution_provider.cc:1498 GetCapability] CUDA kernel not supported. Fallback to CPU execution provider for Op type: Conv node name: StatefulPartitionedCall/mobilenetv2_1.00_224/Conv1/Conv2D
I tried out different opsets.
Does someone know why I am getting errors when running on GPU
That is not an error. That is a warning and it is basically telling you that that particular Conv node will run on CPU (instead of GPU). It is most likely because the GPU backend does not yet support asymmetric paddings and there is a PR in progress to mitigate this issue - https://github.com/microsoft/onnxruntime/pull/4627. Once this PR is merged, these warnings should go away and such Conv nodes will run on the GPU backend.

How to get allocated GPU spec in Google Colab

I'm using Google Colab for deep learning and I'm aware that they randomly allocate GPU's to users. I'd like to be able to see which GPU I've been allocated in any given session. Is there a way to do this in Google Colab notebooks?
Note that I am using Tensorflow if that helps.
Since you can run bash command in colab, just run !nvidia-smi:
This makes it easier to read
!nvidia-smi -L
Run this two commands in collab
CUDA: Let's check that Nvidia CUDA drivers are already pre-installed and which version is it.
!/usr/local/cuda/bin/nvcc --version
!nvidia-smi

CPU usage is more than GPU usage on Keras, Tensorflow

I wanted to teach an image classification CNN, and use Keras for it.
The image dimensions are 300x300x3.
I have trained a CNN with 2M parameters, I used MobileNet of Keras for transfer learning, however I freeze last 63 layers and add dense layers at the bottom, the last layer has 2 unit and Softmax activation.
To make predictions, I load the h5 file and use OpenCV video capture to get video frames, for each frame I use model.predict(img_array).
When i look to the Task Manager of Windows 10 , I see that the Python script uses %80 of my processor but %2 of GPU. This CPU usage causes Lags on my laptop.
How can I reduce the CPU usage and force Keras to make computations with GPU?
I have Nvidia Rtx 2060 4GB and Intel Core i7-9750H on my laptop.
Tensorflow 2.1 and Keras 2.3.1
OpenCV 4.1
I have tried, but actually nothing changes.
tf.config.threading.set_inter_op_parallelism_threads(12)
tf.config.threading.set_intra_op_parallelism_threads(12)
with tf.device(\gpu:0):
model.predict(img_array)
Best regards.
Edit:
I reduce the CPU usage to %20 with declaring steps parameter in the predict method.
Please check your pip list or conda list.
Sometimes, we mistakenly install both tensorflow and tensorflow-gpu.
If you have both, the system will automatically go for tensorflow, which is the CPU one.
If that is the case, DELETE "tensorflow", keeping only "tensorflow-gpu".
If you do not see tensorflow-gpu in the first place, try installing it on conda using the following commands:
conda create -n [EnvironmentName] python=3.6
conda activate [EnvironmentName]
conda install -c conda-forge tensorflow-gpu==1.14
it will assess which version (CUDA,CUDNN, etc.) you require and download and install it directly to your environment. Then run your python file from this environment. Good luck ^_^

Categories

Resources