Unable install Pytorch from source for older GPU card - python

Good day,
I am going fast.ai courses for deep learning. I want to set up locally fastai environment. However, when I try running the first part of the tutorial, I receive an error message stating:
Found GPU0 GeForce GTX 860M which is of cuda capability 5.0.
PyTorch no longer supports this GPU because it is too old.
I found that the problem is related to Pytorch version, which does not support older GPU cards. I found a solution to install older Pytorch from source: http://forums.fast.ai/t/pytorch-not-working-with-an-old-nvidia-card/14632/7
However, when I clone pytorch git and try to install with install.py, I receive a message error.
python setup.py install
running install
running build_deps
error: [WinError 2] The system cannot find the file specified
I have navigated to the correct folder “~\Fast_AI\Github material\fastai\pytorch”, when I list the files I can see setup.py, but it does not allow to run.
Could anyone help solve this problem? I could use CPU, however, it works very slowly when working with deep learning, therefore I would prefer to use GPU.

Related

downgrade tensorflow GPU from v2.8 to v2.7 in google colab

I have some models I trained using TF and have been using for awhile now but since V2.8 came out I am having issues with the models based in MobileNetV3 (large and small), I posted the issue on the tensor-flow git and am waiting for a solution. In the mean time I wan to make some predictions on colab using V2.7 instead of 2.8. I know this involves installing CUDA and and cuDNN. I am really in experienced at this level and setting up TF. does anyone know how to proceed with this? I saw this post but was hoping for a less intensive solution. like can I 'flash' an old colab machine that has 2.7 setup?
as a side note, shouldn't colab have options like this? the main reason I am using colab is that I can run my code anywhere and that it is repeatable.
also I can install and run my code for V2.7 for the CPU version but I want to run on the GPU.
thanks for your help!
edit: sorry I did a poor job at explaining what I already tried. I have tired using pip
!pip install --upgrade tensorflow-gpu==2.7.*
!pip install --upgrade tensorflow==2.7.*
but I get this error
UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
I have also pip uninstalled keras, TF and TF-GPU before installing and I get the same error. yes I restart the runtime as well. someone mentioned that conda tried to install everything when installing TF, is this a possible solution?

Existing Tensorflow model to use GPU

I made a TensorFlow model without using CUDA, but it is very slow. Fortunately, I gained access to a Linux server (Ubuntu 18.04.3 LTS), which has a Geforce 1060, also the necessary components are installed - I could test it, the CUDA acceleration is working.
The tensorflow-gpu package is installed (only 1.14.0 is working due to my code) in my virtual environment.
My code does not contain any CUDA-related snippets. I was assuming that if I run it in a pc with CUDA-enabled environment, it will automatically use it.
I tried the with tf.device('/GPU:0'): then reorganizing my code below it, didn't work. I got a strange error, which said only XLA_CPU, CPU and XLA_GPU is there. I tried it with XLA_GPU but didn't work.
Is there any guide about how to change existing code to take advantage of CUDA?
Not enough to give exact answer.
Have you installed tensorflow-gpu separately? Check using pip list.
Cause, initially, you were using tensorflow (default for CPU).
Once you use want to use Nvidia, make sure to install tensorflow-gpu.
Sometimes, I had problem having both installed at the same time. It would always go for the CPU. But, once I deleted the tensorflow using "pip uninstall tensorflow" and I kept only the GPU version, it worked for me.

Can I let people use a different Tensorflow-gpu version above what they had installed with different CUDA dependencies?

I was trying to pack and release a project which uses tensorflow-gpu. Since my intention is to make the installation as easy as possible, I do not want to let the user compile tensorflow-gpu from scratch so I decided to use pipenv to install whatsoever version pip provides.
I realized that although everything works in my original local version, I can not import tensorflow in the virtualenv version.
ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory
Although this seems to be easily fixable by changing local symlinks, that may break my local tensorflow and is against the concept of virtualenv and I will not have any idea on how people installed CUDA on their instances, so it doesn't seems to be promising for portability.
What can I do to ensure that tensorflow-gpu works when someone from internet get my project only with the guide of "install CUDA X.X"? Should I fall back to tensorflow to ensure compatibility, and let my user install tensorflow-gpu manually?
Having a working tensorflow-gpu on a machine does involve a series of steps including installation of cuda and cudnn, the latter requiring an NVidia approval. There are a lot of machines that would not even meet the required config for tensorflow-gpu, e.g. any machine that doesn't have a modern nvidia gpu. You may want to define the tensorflow-gpu requirement and leave it to the user to meet it, with appropriate pointers for guidance. If the project can work acceptably on tensorflow-cpu, that would be a much easier fallback option.

What is the correct way of setting up Tensorflow on Linux, after all?

I'm having some misinformation problem regarding Tensorflow. Lot's of info on lot's of places, and never complete enough.
I got my system set up with CUDA 8.0, cuDNN and I have Keras + Theano working ok with python 2.7. I'm trying to move to Tensorflow.
As I had compatibility problems with numpy and other stuff when I tried to install it in the same environment, I installed miniconda2, created a virtual env for it conda create -n tensorflow pip and activated it, as instructed here: https://www.tensorflow.org/install/install_linux#InstallingAnaconda
The environment seems operational.
Afterwards, I installed tensorflow from https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.2.1-cp27-none-linux_x86_64.whl and also Keras, only to noticed I had some modules duplicated on conda list, some marked with a version string, others marked with <pip> only. Specially, I got one Tensorflow-gpu 1.2.1 and Tensorflow 1.1.0. Both of them. The old version just comes by with Keras.
Also, there's a myriad of warnings about Tensorflow not being compiled to use certain CPU instruction sets, and there's this answer How to compile Tensorflow with SSE4.2 and AVX instructions? about compiling it with using basel, but I don't really find any information about where to put the source code and what files to move to where after running that bazel command line.
To make matters worse, whenever I run a simple 20x20 matrix multiplication code with "/gpu:0" as device, the code list that horrendous warnings, correctly detects the presence of a GTX 1070, but never really confirms it was used to to the calculations. And it runs faster on "/cpu:0". How I miss Theano...
Could someone point me out where can I find:
what version to download of Tensorflow that is current (not necessarily latest)?
concise steps to get it done and how to test if those steps went right?
I'm using Linux Mint 18.
I have used conda and have installed Tensorflow=1.1.0, but it never seemed to have worked correctly within python. I also came across in github issues that anconda are currently working on the Tensorflow GPU version and so no matter what I tried in Anaconda, it never used my Tesla NVIDIA P100-SXM2-16GB card and it used only the CPU.
I suggest you use the normal environment till they get Tensorflow-gpu to work right in Anaconda.
To check if the tensorflow-gpu works I used the Inception v3 model with TF0.12 / TF1.0.
This is the process that I go through to install tensorflow1.0:
Step 0.
sudo -i
apt-get install aptitude
aptitude install software-properties-common
apt-get install libcupti-dev pip
apt-get update
apt-get upgrade libc6
Step 1. Install Nvidia Components. I think you already have that installed
Download the NVIDIA cuDNN 5.1 for CUDA 8.0 from
https://developer.nvidia.com/rdp/cudnn-download
(Registration in NVIDIA's Accelerated Computing Developer Program is required)
Cudnn 5.1 works well with most of the architectures and OS out there
Step 2. Install bazel and tensorflow
apt-get install bazel
you can go to this link https://pypi.python.org/pypi/tensorflow-gpu/1.1.0rc0 and do a
pip install <python-wheel-version>
If you have python2.7 and python 3.* installed, then use pip2 to install for python2.7
Step 3. Install openjdk
apt-get install openjdk-8-jdk
Step 4. git clone the Inception model code
git clone https://github.com/tensorflow/models.git
cd models
git checkout master
cd inception
This is where bazel comes in the picture. See Bazel's Getting Started docs for a more detailed explanation of what a target is. So, if you do a
ls -lstr
you might see 5 bazel related symbolic links
bazel-bin bazel-genfiles bazel-inception bazel-out bazel-testlogs
these are the target directory to which you build your specific model
Assuming you're in the models/inception directory
bazel build inception/imagenet_train
This activates the symbolic link
NOTE: For this imagenet_train.py to work you need to prepare the imagenet dataset. You either skip this part or go through this:
STEP 5. Prepare the Imagenet dataset
Before you run the training script for the first time, you will need to download and convert the ImageNet data to native TFRecord format.
To begin, you will need to sign up for an account with ImageNet to gain access to the data. Look for the sign-up page, create an account and request an access key to download the data.
After you have USERNAME and PASSWORD, you are ready to run our script. Make sure that your hard disk has at least 500 GB of free space for downloading and storing the data. Here we select DATA_DIR=$HOME/imagenet-data as such a location but feel free to edit accordingly.
When you run the below script, please enter USERNAME and PASSWORD when prompted. This will occur at the very beginning. Once these values are entered, you will not need to interact with the script again.
#location of where to place the ImageNet data
DATA_DIR=$HOME/imagenet-data
Here $HOME is /root
# build the preprocessing script.
bazel build inception/download_and_preprocess_imagenet
# run it
bazel-bin/inception/download_and_preprocess_imagenet "${DATA_DIR}"
# Place the tensor records at /root/dataset
Step 6. Source bazel and tensorflow
This step is very important. This will activate the python packages and I think you maybe getting errors because the python package for tensorflow is not activated.
If you have skipped step 5 then you might want to go to
/models/inception/sample
and run the gpu.py script
python gpu.py
This should verify that your tensorflow version works with your gpu
source /opt/DL/bazel/bin/bazel-activate
source /opt/DL/tensorflow/bin/tensorflow-activate
You also check by importing tensorflow into python
eg:
import tensorflow as tf
find a hello world eg on their site and if this gives errors then it has not been installed properly
Step 7. Run the imagenet training --You can skip this step if you have skipped step 5.
bazel-bin/inception/imagenet_train --num_gpus=1 --batch_size=256 --train_dir=/tmp --data_dir=/root/dataset/ --max_steps=100

Installing Theano and Keras on my Windows 10 Workstation

I am trying to use Keras to develop a Neural Network in Python, after managing to install on my Windows 10 Workstation Anaconda3 (with all its libraries: numpy, scikit-learn, pandas, SciPy and matplotlib), I realized to need TensorFlow or Theano, too.
After I failed intalling TensorFlow, I downloaded and was able to install Theano, but trying to import it from the Python prompt, I received the following:
WARNING: "g ++ not detected! Theano will be unable to execute optimized C implementations (for both CPU and GPU) and will default to Python implementations. Performance will be several degraded. To remove this warning, set Theano flags cxx to an empty string"
Hoping in this way to solve the problem, I downloaded the GNU compiler for C++ Cygwin64, but nothing has changed, at all! Acknowledge that this is really the right way to move forward, how should I access the "Theano flags cxx"?
first, its only performance issue to run theano without g++. it a warning and not exception when importing it.
BUT probably you want performance when using deep learning lib like keras so lets try fix the theano installation.
please follow the theano docs about installing theano on windows. you might want to clean previous installation of requirements.
to install the gcc follow this section which says:
Theano C code compiler currently requires a GCC installation. We have
used the build TDM GCC which is provided for both 32- and 64-bit
platforms...
download from here follow the installation instruction.
Tensorflow
I recommending working with tensorflow as keras recently changed the default backend from theano to tensorflow.
using anaconda and pip you should easily do pip install tensorflow and it will work.
actually today I just installed keras and tensorflow on windows 10 using anaconda by just running pip install keras tensorflow so I suggest you try fresh clean installation of anaconda and python and try this again.
please update if you succeed or having another issues installing theano / tensorflow / keras

Categories

Resources