How to get allocated GPU spec in Google Colab - python

I'm using Google Colab for deep learning and I'm aware that they randomly allocate GPU's to users. I'd like to be able to see which GPU I've been allocated in any given session. Is there a way to do this in Google Colab notebooks?
Note that I am using Tensorflow if that helps.

Since you can run bash command in colab, just run !nvidia-smi:

This makes it easier to read
!nvidia-smi -L

Run this two commands in collab
CUDA: Let's check that Nvidia CUDA drivers are already pre-installed and which version is it.
!/usr/local/cuda/bin/nvcc --version
!nvidia-smi

Related

training YOLOv7 on CPU provides CUDA error

I am trying to run train a yolov7 model without a gpu. This is currently the command line that I am using on colab.
python train_aux.py --workers 1 --device cpu --batch-size 1 --data data/coco.yaml --img 128 128 --cfg /content/yolov7/cfg/training/yolov7-e6e.yaml --weights '' --name yolov7-e6e --hypdata/hyp.scratch.p6.yaml`
For some reason I first get an warning
warnings.warn('User provided device_type of \'cuda\', but CUDA is not available. Disabling')
and then I get the error
RuntimeError: No CUDA GPUs are available
during the first epoch. I dont understand why it is trying to use cuda when I am running it on CPU. Am I missing some spot that I have to edit in the code to fix this? Here is the link to the github that I am using
I have tried to download the cuda library incase that helped using.
!pip install cuda-python
but it didnt solve the issue.
So it looks like this issue is due to cuda being hard coded into the model for certain procedures. A more in-depth explanation can be found here link.In the meantime removing the --device cpu for some reason fixed it.

Can I create a docker in Google Colab?

I want to create a version of my Colab notebook that will be immune to changes in the standardly used versions of python and pytorch within colab. Essentially creating something like a docker that will not need to be updated. Is this possible?
Ideally I'd like to keep them as:
Python version: 3.7
PyTorch version: 1.10.0+cu111
CUDA version: 11.1
cuDNN version: 8005
Is this possible?
I don't know anything about Google Colab, but in general, you can achieve this by "pin" versions of:
docker image you built and always go off your custom docker image (meaning any changes to python and packages versions won't affect already built image) link
pin your packages inside Dockerfile so rebuilds ALWAYS use the same version
This could solve your use-case

Create a custom SageMaker image with recent Python release

I am using Sagemaker Notebook Instances on AWS.
Looks like we can only use Python 3.6 kernels.
I would like to be able to use Python 3.10 (latest version, or at least Python 3.9) in a notebook.
So far, what I have tried is based on life cycle: https://docs.aws.amazon.com/sagemaker/latest/dg/studio-byoi-create-sdk.html
But somehow, it didn't work (I was not able to use the recent kernel in the notebook)
I have found an interesting link: https://github.com/aws-samples/sagemaker-studio-custom-image-samples
but my knowledge is a bit limited and I do not know what exactly I should look at precisely to see the example I should follow.
Any advice/lead you could suggest please ?
Thanks
SageMaker Data Science Kernel supports Python 3.6 version at the moment.
If you need a persistent custom kernel in SageMaker studio, you can create an ECR repository and build a docker image with custom environment configurations. This image can then be attached to the SageMaker studio notebooks. Reference link!

Confused with setting up ML and DL on GPU

My goal is to set up my PC for machine and deep learning through my GPU. I've read about all the different components however I can not connect the dots for what I need to do.
OS: Ubuntu 20.04
GPU: Nvidia RTX 2070 Super
Anaconda: 4.8.3
I've installed the nvidia-cuda-toolkit (10.1.243), but now what?
How does this integrate with jupyter notebook?
The 3 python modules I want to work with are:
turicreate - I've gotten this to run off CPU but not GPU
scikit-learn
tensorflow
matlab
I know cuDNN and pyCUDA fit in there somewhere.
Any help is appreciated. Thanks
First of all - I have the experience limited to ubuntu 18.04 and 16.xx and python DL frameworks. But I hope some sugestions will be helpfull.
If I were familiar with docker I would rather consider to use docker instead of setting-up everything from scratch. This approach is described in section about tensorflow container
If you decided to setup all components yourself please see this guideline
I used some contents from it for 18.04, succesfully.
be carefull with automatic updates. After the configuration is finished and tested protect it from being overwritten with newest version of CUDAor TensorRT.
Answering one of your sub-questions - How does this integrate with jupyter notebook? - it does not, becuase it is unneccesary. CUDA library cooperates with a framework such as Tensorflow, not with the Jupyter. Jupyter is just an editor and execution controller on the server side.

undefined symbol: cudnnCreate in ubuntu google cloud vm instance

I'm trying to run a tensorflow python script in a google cloud vm instance with GPU enabled. I have followed the process for installing GPU drivers, cuda, cudnn and tensorflow. However whenever I try to run my program (which runs fine in a super computing cluster) I keep getting:
undefined symbol: cudnnCreate
I have added the next to my ~/.bashrc
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda-8.0/lib64:/usr/local/cuda-8.0/extras/CUPTI/lib64:/usr/local/cuda-8.0/lib64"
export CUDA_HOME="/usr/local/cuda-8.0"
export PATH="$PATH:/usr/local/cuda-8.0/bin"
but still it does not work and produces the same error
Answering my own question: The issue was not that the library was not installed, the library installed was the wrong version hence it could not find it. In this case it was cudnn 5.0. However even after installing the right version it still didn't work due to incompatibilities between versions of driver, CUDA and cudnn. I solved all this issues by re-installing everything including the driver taking into account tensorflow libraries requisites.

Categories

Resources