Utilizing hardware AI accelerators with PyTorch

Utilizing hardware AI accelerators with PyTorch - python

I'm pretty new to StackOverflow, but also to using PyTorch. I'm an AI and CS major, and I'm working on a project involving processing video with ML models. I'm not going to get into the details because I want any answers to this question to be generally accessible to others using pytorch, but the issue is I'm using pytorch with vapoursynth at the moment, accelerating both with CUDA, but I'm looking into purchasing as AI accelerator like this:
Amazon
Documentation on using these with Tensorflow is pretty easy to find, but I'm having trouble trying to answer for myself how I can use one of these with PyTorch. Does anybody have experience with this? I'd simply like to be able to use this card to accelerate training a Neural Net.

It is correct that you would need to convert your code to run on XLA, but that includes only changing few lines in your code. Please refer to https://github.com/pytorch/xla README doc for references and guides. With few modifications you can get significant training speedup.

I think the experience of using Pytorch on TPU would be less smooth than it on nvidia GPU. As far as I know, you have to use XLA to convert pytorch models to make them able to run on TPU.

Related

Running a programme in cpu and Gpu without using two scripts

I am working on solving a problem using ml as well as deep learning in python. Deep learning models are trained on gpu whereas machine learning on cpu. Since in my code the ml part comes after dl it is executed only after dl part is completed. In theory since they will use different resources they can be run together. Is there any way to do it. One naive way I can think is to split code in two scripts and run but I am looking for a sophisticated way.
Thanks

Deploying Pytorch only for prediction

I've trained my model locally and now I want to use it in my Kubernetes cluster. Unfortunately, all the Docker images for Pytorch are 5+ GBs because they contain the scripts for training which I won't need now. I've created my own image which is only 3.5 GBs but still huge. Is there a slim Pytorch version for predictions? If not, which parts of the package can I safely remove and how?

No easy answer for Python version of PyTorch unfortunately (or at least none I’m aware of).
Python, in general, is not well-suited for Docker deployments as it carries over the dependencies (even if you don't need all of their functionality, imports are often at the top of the file making your aforementioned removal infeasible for projects of PyTorch size and complexity).
There is a way out though...
torchscript
Given your trained model you can convert it to traced/scripted version (see here). After you manage that:
Inference in other languages
Write your inference code in another language, either Java or C++(see here for more info).
I have only used C++, but you might get there easier with Java, I think.
Results
Managed to get PyTorch for CPU inference to roughly ~32MB, GPU would weight more and be way more complex though and would probably need ~1GB of CUDNN dependency itself.
C++ way
Please note torchlambda project is not currently maintained and I’m the creator, hopefully it gives you some tips at least.
See:
Dockerfile for the image build
CMake used for building
Docs for more info about compilation options etc.
C++ inference code
Additional notes:
It also uses AWS SDKs and you would have to remove them from at least these files
You don't need static compilation - it would help to reach the lowest possible (I could come up with) image size, but not strictly necessary (additional ‘100MB’ or so)
Final
Try Java first as it’s packaging is probably saner (although final image would probably be a little bigger)
The C++ way not tested for the newest PyTorch version and might be subject to change with basically any release
In general it takes A LOT of time and debugging, unfortunately.

Tensorflow 2.7 GPU Memory not released

I am currently working on 1D Convolutional Neural Networks for Time Series Classification. Recently, i got CUDA working on my GeForce 3080 (which was a pain itself). However, i noticed a weird behavior when using tensorflow and cuda. After training a model, the gpu memory is not released, even after deleting the variables and doing garbage collection. I tried reseting the tf graph and closing the tf sessions, but the gpu memory stays allocated. This results in cross validation crashing and me having to restart my python environment every time i want to make changes and retrain my model.
After a tideous search, I found out people have been struggling with this 5 years ago. However, I am right now using tf 2.7. I am working on Ubuntu 20.04.3. Some of my colleagues are using windows and are not experiencing these problems. However, it seems like they do not have any issues with models not being able to be retrained because of already allocated memory.
I found the workaround using multiple processes, but wasn't able to get it to work for my model using 10 fold cv.
As the issue has been up for more than 5yrs now and my colleagues not having any problems, I was wondering if I am doing sth. wrong. I think that issue might very likely have been fixed after 5 years, which is why I think my code is the problem here.
Is there any solution / guide for tf 2.7 and memory allocation of the gpu?

Is there any way to train numpy neural networks faster?

I implemented a Neural Network class using only python and numpy, and I want to do some experiments with it. The problem is that it takes so long to train. My computer does not have a high-end GPU nor a wonderful CPU, so I thought about some sort of 'cloud training'.
I know libraries such as TensorFlow or PyTorch use backends to train neural networks faster, and I was wondering if something similar could be achieved with numpy. Is there a way to run numpy in the cloud?
Even if it is slow and doesn't use GPUs would be fine for me. I tried to load my files to Google Colab, but it didn't work so well. It stopped running due to inactivity after some time.
Is there any nice solution out there?
Thanks for reading it all!

Try to use cupy instead of numpy, it runs on GPU (works well on colab GPU instance) and maybe you should do just some little modifications to your code.

Use trained Tensorflow Graphs in Matlab

I would like to train tensorflow models with the python API but use the trained graphs for inference in Matlab. I searched for possibilities to do this, but I can't seem to figure it out.
Does anybody have a good idea how to do this? Do I have to compile the model with bazel? Do I do it with tensorflow serving? Do I load the metagraph in a C++ function that I include in Matlab?
Please keep in mind that I'm an enigeer and don't have extensive programming knowledge :)

In case someone lands here with a similar question, I'd like to suggest tensorflow.m - a Matlab package I am currently writing (available on GitHub).
Although still in development, simple functionality like importing a frozen graph and running an inference is already possible (see the examples) - I believe this is what you were looking for?
The advantage is that you don't need any expensive toolbox nor a Python/Tensorflow installation on your machine. I'd be glad if the package can be of use for someone looking for similar solutions; even more so, in case you extend/implement something and open a PR.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.