Dlib not using GPU on Google Colab - python

How do I force training on GPU?
Currently it's only using CPU even when I run dlib.DLIB_USE_CUDA and it says true.
It also says 1 when I run print(dlib.cuda.get_num_devices())
Here's the attached image that shows that there's nothing running on GPU when in fact I am running the code:
NOTE: GPU was set as RUn

Comment:
Apparently as what I've tested this wasn't a training error but rather it is loading error. It takes so much time and ram to load ibug-300W files. Is there any way to load this faster?
If someone ever stumbled upon this issue or problem on google colab (Slow training time).
The way to load this faster is to transfer the dataset directly on the vm/content of colab. Because the transfer speed between Drive and Colab is slow.
PS: You need atleast 14-15GB of ram to load ibug-300W files.

Related

Run code on GPU instead of CPU with detecto

I am using machine learning with detecto in Python. However, whenever I run it, I get a warning saying
It looks like you're training your model on a CPU. Consider switching to a GPU; otherwise,
this method can take hours upon hours or even days to finish. For more information, see
https://detecto.readthedocs.io/en/latest/usage/quickstart.html#technical-requirements
I have a GPU in the form of an Intel(R) HD graphics 4600, but for some reason the code is running on the CPU. I have checked out the link it gives which says
By default, Detecto will run all heavy-duty code on the GPU if it’s available and on the CPU otherwise.
It recommends using Google Collab if the computer doesn't have a GPU it can use, but I do have one, and don't want to use Google Collab.
Why is it running on the CPU instead of the GPU? And how can I fix it? The part of my code where I get the warning is
losses = fitmodel(loader, Test_dataset, epochs=25, lr_step_size=5,
learning_rate=0.001, verbose=True)
The code does work, however it takes ages to run, so want to be able to run it on the GPU to save time.
The GPU that detecto is referring to would need to be a CUDA capable Nvidia GPU. So your Intel(R) HD graphics 4600 does not meet this criterion.
Detecto uses pytorch internally, whichs GPU support is based on CUDA. So in order to use a GPU, you would need to move to a machine that has a CUDA capable card

Tensorflow 2.7 GPU Memory not released

I am currently working on 1D Convolutional Neural Networks for Time Series Classification. Recently, i got CUDA working on my GeForce 3080 (which was a pain itself). However, i noticed a weird behavior when using tensorflow and cuda. After training a model, the gpu memory is not released, even after deleting the variables and doing garbage collection. I tried reseting the tf graph and closing the tf sessions, but the gpu memory stays allocated. This results in cross validation crashing and me having to restart my python environment every time i want to make changes and retrain my model.
After a tideous search, I found out people have been struggling with this 5 years ago. However, I am right now using tf 2.7. I am working on Ubuntu 20.04.3. Some of my colleagues are using windows and are not experiencing these problems. However, it seems like they do not have any issues with models not being able to be retrained because of already allocated memory.
I found the workaround using multiple processes, but wasn't able to get it to work for my model using 10 fold cv.
As the issue has been up for more than 5yrs now and my colleagues not having any problems, I was wondering if I am doing sth. wrong. I think that issue might very likely have been fixed after 5 years, which is why I think my code is the problem here.
Is there any solution / guide for tf 2.7 and memory allocation of the gpu?

Is there any way to train numpy neural networks faster?

I implemented a Neural Network class using only python and numpy, and I want to do some experiments with it. The problem is that it takes so long to train. My computer does not have a high-end GPU nor a wonderful CPU, so I thought about some sort of 'cloud training'.
I know libraries such as TensorFlow or PyTorch use backends to train neural networks faster, and I was wondering if something similar could be achieved with numpy. Is there a way to run numpy in the cloud?
Even if it is slow and doesn't use GPUs would be fine for me. I tried to load my files to Google Colab, but it didn't work so well. It stopped running due to inactivity after some time.
Is there any nice solution out there?
Thanks for reading it all!
Try to use cupy instead of numpy, it runs on GPU (works well on colab GPU instance) and maybe you should do just some little modifications to your code.

SampleRNN - Pytorch implementation beginner

I'm trying to start work with this: https://github.com/deepsound-project/samplernn-pytorch
I've installed all the library dependencies through Anaconda console but I'm then not sure how I'm to run the python training scripts.
I guess I just need general help with getting a git RNN in python working? I've found a lot of tutorials that show working from notebooks in Jupyter or even from scratch but can't find ones working from python code files?
I'm sorry if my terminology is backward, I'm an architect who is attempting coding, note a software engineer.
There are instructions for getting the SampleRNN implementation working in terminal on the git page. All of the commands listed on the page are for calling the Python scripts from terminal, not from a Jupyter Notebook. If you've installed all the correct dependencies then in theory all you should need to do is call the terminal scripts to try it out.
FYI it took me a while to find a combination of parameters with which this model would train without running into memory errors, but I was working with my own dataset, not the one provided. It's also very intensive - the default train time is 1000 epochs which even on my relatively capable GPU was prohibitively high, so you might want to reduce that value considerably just to reach the end of a training cycle unless you have a sweet setup :)

Lagging System or a possible bug in TensorFlow?

I am currently working on RnD in TensorFlow (CPU Version), but unable to decide on the basic requirement for my system for training on large datasets or may be I stumbled upon a possible bug in TensorFlow library.
The Official TensorFlow documentation, nowhere suggests any specific requirement for the system to be building and running TensorFlow programs on. From what I can understand, if that can be run over Windows, Linux, Mac along with Android, iOS and also over embedded systems like RaspberryPi, I suppose there should not be any such hardware requirement for the same.
However, while in the process of initial research, I tried running the TensorFlow Seq2Seq model (translating English to French https://www.tensorflow.org/tutorials/seq2seq), where the training and test datasets end up taking around 7-8 GB of diskspace initially and 20-22Gb on a whole. Once the translate.py python script is executed, it ends up choking the memory and pushing disk utilization to 98% and 100% respectively.
My current system runs Windows 8.1 64 bit OS, Core i5 5200U clocking at 2.2 GHz, 8GB RAM and around 70GB free space on HDD (specifically allotted for TensorFlow usage). But even after allowing my system to run over 7-8 hours (with no other application running) it got stuck multiple times and usually after the memory utilization peeks to around 100% after tokenizing the datasets.
Though I am not sure, but I suppose the TensorFlow learning graph is being created inside the RAM and once it expands to around all the memory space, the program ends up in un-ending loop waiting for memory to get cleared and then increase the learning graph.
So the whole drills down to 3 questions:
Does TensorFlow uses RAM for building and saving Learning Graph? If so, is it possible to get choked in a similar fashion?
From a business perspective, is there a minimum hardware requirement for training such a system?
If it is not the system requirement, can this be a possible bug in TensorFlow library which pushes it into an unending loop waiting for memory to get cleared?
Update
After running the python script for over 30 hours continuously, the process seems to have stuck at the same place for past 14 hours while "Reading development and training data". Refer image below for further investigation:
As soon as I was about to shut down the program, the same started responding again and I waited for another 15-20 minutes and finally I got the answer from the OS itself. It was indeed low RAM that was causing the problem. Attaching the screen grab of the Windows Alert of system running low on memory for reference, incase anyone gets caught in same situation.
UPDATE
I tried taking a VM instance on Google Cloud Platform. This machine had 2 x Intel Xeon (R) each running at 2.23 GHZ, with 13GB RAM and 50GB storage. But the result was same in this situation also even though the application was utilising more than 10.5 GB RAM. Seems like this tutorial script needs a very intense system probably a Super Computer with atleast 32 GB RAM to run and execute completely. I might look to write/arrange my own dataset now. However, this must be taken as future enhancement to use Persistent Storage (HDD/SSD) to create the Graph instead of RAM so as to avoid chocking of Memory.

Categories

Resources