Can I create a docker in Google Colab? - python

I want to create a version of my Colab notebook that will be immune to changes in the standardly used versions of python and pytorch within colab. Essentially creating something like a docker that will not need to be updated. Is this possible?
Ideally I'd like to keep them as:
Python version: 3.7
PyTorch version: 1.10.0+cu111
CUDA version: 11.1
cuDNN version: 8005
Is this possible?

I don't know anything about Google Colab, but in general, you can achieve this by "pin" versions of:
docker image you built and always go off your custom docker image (meaning any changes to python and packages versions won't affect already built image) link
pin your packages inside Dockerfile so rebuilds ALWAYS use the same version
This could solve your use-case

Related

Create a custom SageMaker image with recent Python release

I am using Sagemaker Notebook Instances on AWS.
Looks like we can only use Python 3.6 kernels.
I would like to be able to use Python 3.10 (latest version, or at least Python 3.9) in a notebook.
So far, what I have tried is based on life cycle: https://docs.aws.amazon.com/sagemaker/latest/dg/studio-byoi-create-sdk.html
But somehow, it didn't work (I was not able to use the recent kernel in the notebook)
I have found an interesting link: https://github.com/aws-samples/sagemaker-studio-custom-image-samples
but my knowledge is a bit limited and I do not know what exactly I should look at precisely to see the example I should follow.
Any advice/lead you could suggest please ?
Thanks
SageMaker Data Science Kernel supports Python 3.6 version at the moment.
If you need a persistent custom kernel in SageMaker studio, you can create an ECR repository and build a docker image with custom environment configurations. This image can then be attached to the SageMaker studio notebooks. Reference link!

Confused with setting up ML and DL on GPU

My goal is to set up my PC for machine and deep learning through my GPU. I've read about all the different components however I can not connect the dots for what I need to do.
OS: Ubuntu 20.04
GPU: Nvidia RTX 2070 Super
Anaconda: 4.8.3
I've installed the nvidia-cuda-toolkit (10.1.243), but now what?
How does this integrate with jupyter notebook?
The 3 python modules I want to work with are:
turicreate - I've gotten this to run off CPU but not GPU
scikit-learn
tensorflow
matlab
I know cuDNN and pyCUDA fit in there somewhere.
Any help is appreciated. Thanks
First of all - I have the experience limited to ubuntu 18.04 and 16.xx and python DL frameworks. But I hope some sugestions will be helpfull.
If I were familiar with docker I would rather consider to use docker instead of setting-up everything from scratch. This approach is described in section about tensorflow container
If you decided to setup all components yourself please see this guideline
I used some contents from it for 18.04, succesfully.
be carefull with automatic updates. After the configuration is finished and tested protect it from being overwritten with newest version of CUDAor TensorRT.
Answering one of your sub-questions - How does this integrate with jupyter notebook? - it does not, becuase it is unneccesary. CUDA library cooperates with a framework such as Tensorflow, not with the Jupyter. Jupyter is just an editor and execution controller on the server side.

How to get allocated GPU spec in Google Colab

I'm using Google Colab for deep learning and I'm aware that they randomly allocate GPU's to users. I'd like to be able to see which GPU I've been allocated in any given session. Is there a way to do this in Google Colab notebooks?
Note that I am using Tensorflow if that helps.
Since you can run bash command in colab, just run !nvidia-smi:
This makes it easier to read
!nvidia-smi -L
Run this two commands in collab
CUDA: Let's check that Nvidia CUDA drivers are already pre-installed and which version is it.
!/usr/local/cuda/bin/nvcc --version
!nvidia-smi

Compiling binary with tensorflow library for cpu: Cannot find cuda library?

In development, I have been using the gpu-accelerated tensorflow
https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.2.1-cp35-cp35m-linux_x86_64.whl
I am attempting to deploy my trained model along with an application binary for my users. I compile using PyInstaller (3.3.dev0+f0df2d2bb) on python 3.5.2 to create my application into a binary for my users.
For deployment, I install the cpu version, https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.2.1-cp35-cp35m-linux_x86_64.whl
However, upon successful compilation, I run my program and receive the infamous tensorflow cuda error:
tensorflow.python.framework.errors_impl.NotFoundError:
tensorflow/contrib/util/tensorflow/contrib/cudnn_rnn/python/ops/_cudnn_rnn_ops.so:
cannot open shared object file: No such file or directory
why is it looking for cuda when I've only got the cpu version installed? (Let alone the fact that I'm still on my development machine with cuda, so it should find it anyway. I can use tensorflow-gpu/cuda fine in uncompiled scripts. But this is irrelevant because deployment machines won't have cuda)
My first thought was that somehow I'm importing the wrong tensorflow, but I've not only used pip uninstall tensorflow-gpu but then I also went to delete the tensorflow-gpu in /usr/local/lib/python3.5/dist-packages/
Any ideas what could be happening? Maybe I need to start using a virtual-env..

Tensorflow Retrain on Windows

When I follow the tutorials of "How to Retrain Inception's Final Layer for New Categories", I need to build the retainer like this
bazel build tensorflow/examples/image_retraining:retrain
However, my tensorflow on windows does not have such directory. I am wondering why and how can I solve the problem?
Thank you in advance
In my case tensorflow version is 1.2 and corresponding retrain.py is here.
Download and extract flowers images from here.
Now run the the retrain.py file as
python retrain.py --image_dir=path\to\dir\where\flowers\images\where\extracted --output_lables=retrained_labels.txt --output_graph=retrained_graph.pb
note: the last two arguments in the above command are optional.
Now to test the retrained model:
go the master branch and download the label_image.py code as shown below
Then run python label_image.py --image=image/path/to/test/classfication --graph=retrained_graph.pb --labels=retrained_labels.txt
The result will be like
From the screenshot, it appears that you have installed the TensorFlow PIP package, whereas the instructions in the image retraining tutorial assume that you have cloned the Git repository (and can use bazel to build TensorFlow).
However, fortunately the script (retrain.py) for image retraining is a simple Python script, which you can download and run without building anything. Simply download the copy of retrain.py from the branch of the TensorFlow repository that matches your installed package (e.g. if you've installed TensorFlow 0.12, you can download this version), and you should be able to run it by typing python retrain.py at the Command Prompt.
I had the same problem on windows. My windows could not find script.retrain. I downloaded retrain.py file from tensoflow website at here. Then, copied the file in the tensorflow folder and run the retrain script using Python command.

Categories

Resources