Amazon EC2 Tensorflow GPU support

Amazon EC2 Tensorflow GPU support - python

I tried looking around a lot and couldn't find a fix yet.
Things I tried included checking all the paths and environment variables.
When I try running Keras with the TensorFlow backend using Python 3.6 on the Deep Learning AMI (m4.xlarge)
As soon as Tensorflow is imported this is the output:
/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6
return f(*args, **kwds)
Running:
print ("VERSION", tf.Session(config=tf.ConfigProto(log_device_placement=True)))
Returns:
2017-12-06 01:19:49.592416: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2017-12-06 01:19:49.603333: E tensorflow/stream_executor/cuda/cuda_driver.cc:406] failed call to cuInit: CUDA_ERROR_UNKNOWN
2017-12-06 01:19:49.603378: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:145] kernel driver does not appear to be running on this host (ip-172-31-41-243): /proc/driver/nvidia/version does not exist
Device mapping: no known devices.
2017-12-06 01:19:49.604178: I tensorflow/core/common_runtime/direct_session.cc:299] Device mapping:
VERSION <tensorflow.python.client.session.Session object at 0x7f24f3d69c88>
Any clue why I can't get the GPUs to run?

I think the problem is well interesting. To enable the GPUs support, please make sure there is any supported GPUs device installed in your instance. As far as I know M4 instances do not provide any GPUs.
A solution for this problem, you should be start a new instance with GPUs like P3 and P2 instance. (For personal exploration, I suggest you start will spot instance, which will be more cost-effective.) Then run the same code again, I think it will works if there is no bugs in the code itself.

Related

PyTorch CUDA GPU not utilized properly

I am trying to train a pytorch model on my local machine. It has the following GPUs:
As you see the second is NVIDIA and thus should be used with CUDA. In fact if I check torch.cuda.device_count() it returns 1 and torch.cuda.get_device_name() returns NVIDIA GeForce 930MX. When I run the script however the usage of the built-in Intel GPU goes up to 100% and then the program crashes with:
OSError: [WinError 1450] Insufficient system resources exist to complete the requested service
The usage (as seen from the task manager) of the targeted GPU (NVIDIA) remains 0% so it has not been called.
What configuration steps I might have messed up and what would you propose in order to run PyTorch on the proper GPU.
*Using the LTS versions of torch and CUDA as of the day of posting the question.

Tensorflow GPU "SSE4.1 instructions" issue inside a Docker

I try to run tensorflow 1.13.1 inside a docker (the image with the wanted configuration is evariste/autodl:gpu-latest).
The docker has access to a RTX 2080 Ti GPU.
I get the following error:
2020-09-10 16:09:47.428460: F tensorflow/core/platform/cpu_feature_guard.cc:37] The TensorFlow library was compiled to use SSE4.1 instructions, but these aren't available on your machine.

SSE4.1 is an instruction set supported by CPU, not GPU. Thus you need to check if your CPU supports it; more discussions about this topic can be found here.

TensorFlow warning: Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA [duplicate]

I have recently installed tensorflow (Windows CPU version) and received the following message:
Successfully installed tensorflow-1.4.0 tensorflow-tensorboard-0.4.0rc2
Then when I tried to run
import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
sess.run(hello)
'Hello, TensorFlow!'
a = tf.constant(10)
b = tf.constant(32)
sess.run(a + b)
42
sess.close()
(which I found through https://github.com/tensorflow/tensorflow)
I received the following message:
2017-11-02 01:56:21.698935: I C:\tf_jenkins\home\workspace\rel-win\M\windows\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
But when I ran
import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))
it ran as it should and output Hello, TensorFlow!, which indicates that the installation was successful indeed but there is something else that is wrong.
Do you know what the problem is and how to fix it?

What is this warning about?
Modern CPUs provide a lot of low-level instructions, besides the usual arithmetic and logic, known as extensions, e.g. SSE2, SSE4, AVX, etc. From the Wikipedia:
Advanced Vector Extensions (AVX) are extensions to the x86 instruction
set architecture for microprocessors from Intel and AMD proposed by
Intel in March 2008 and first supported by Intel with the Sandy
Bridge processor shipping in Q1 2011 and later on by AMD with the
Bulldozer processor shipping in Q3 2011. AVX provides new features,
new instructions and a new coding scheme.
In particular, AVX introduces fused multiply-accumulate (FMA) operations, which speed up linear algebra computation, namely dot-product, matrix multiply, convolution, etc. Almost every machine-learning training involves a great deal of these operations, hence will be faster on a CPU that supports AVX and FMA (up to 300%). The warning states that your CPU does support AVX (hooray!).
I'd like to stress here: it's all about CPU only.
Why isn't it used then?
Because tensorflow default distribution is built without CPU extensions, such as SSE4.1, SSE4.2, AVX, AVX2, FMA, etc. The default builds (ones from pip install tensorflow) are intended to be compatible with as many CPUs as possible. Another argument is that even with these extensions CPU is a lot slower than a GPU, and it's expected for medium- and large-scale machine-learning training to be performed on a GPU.
What should you do?
If you have a GPU, you shouldn't care about AVX support, because most expensive ops will be dispatched on a GPU device (unless explicitly set not to). In this case, you can simply ignore this warning by
# Just disables the warning, doesn't take advantage of AVX/FMA to run faster
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
... or by setting export TF_CPP_MIN_LOG_LEVEL=2 if you're on Unix. Tensorflow is working fine anyway, but you won't see these annoying warnings.
If you don't have a GPU and want to utilize CPU as much as possible, you should build tensorflow from the source optimized for your CPU with AVX, AVX2, and FMA enabled if your CPU supports them. It's been discussed in this question and also this GitHub issue. Tensorflow uses an ad-hoc build system called bazel and building it is not that trivial, but is certainly doable. After this, not only will the warning disappear, tensorflow performance should also improve.

Update the tensorflow binary for your CPU & OS using this command
pip install --ignore-installed --upgrade "Download URL"
The download url of the whl file can be found here
https://github.com/lakshayg/tensorflow-build

CPU optimization with GPU
There are performance gains you can get by installing TensorFlow from the source even if you have a GPU and use it for training and inference. The reason is that some TF operations only have CPU implementation and cannot run on your GPU.
Also, there are some performance enhancement tips that makes good use of your CPU. TensorFlow's performance guide recommends the following:
Placing input pipeline operations on the CPU can significantly improve performance. Utilizing the CPU for the input pipeline frees the GPU to focus on training.
For best performance, you should write your code to utilize your CPU and GPU to work in tandem, and not dump it all on your GPU if you have one.
Having your TensorFlow binaries optimized for your CPU could pay off hours of saved running time and you have to do it once.

For Windows, you can check the official Intel MKL optimization for TensorFlow wheels that are compiled with AVX2. This solution speeds up my inference ~x3.
conda install tensorflow-mkl

For Windows (Thanks to the owner f040225), go to here: https://github.com/fo40225/tensorflow-windows-wheel to fetch the url for your environment based on the combination of "tf + python + cpu_instruction_extension". Then use this cmd to install:
pip install --ignore-installed --upgrade "URL"
If you encounter the "File is not a zip file" error, download the .whl to your local computer, and use this cmd to install:
pip install --ignore-installed --upgrade /path/target.whl

If you use the pip version of TensorFlow, it means it's already compiled and you are just installing it. Basically you install TensorFlow-GPU, but when you download it from the repository and trying to build it, you should build it with CPU AVX support. If you ignore it, you will get a warning every time when you run on the CPU. You can take a look at those too.
Proper way to compile Tensorflow with SSE4.2 and AVX
What is AVX Cpu support in tensorflow

The easiest way that I found to fix this is to uninstall everything then install a specific version of tensorflow-gpu:
uninstall tensorflow:
pip uninstall tensorflow
uninstall tensorflow-gpu: (make sure to run this even if you are not sure if you installed it)
pip uninstall tensorflow-gpu
Install specific tensorflow-gpu version:
pip install tensorflow-gpu==2.0.0
pip install tensorflow_hub
pip install tensorflow_datasets
You can check if this worked by adding the following code into a python file:
from __future__ import absolute_import, division, print_function, unicode_literals
import numpy as np
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_datasets as tfds
print("Version: ", tf.__version__)
print("Eager mode: ", tf.executing_eagerly())
print("Hub Version: ", hub.__version__)
print("GPU is", "available" if tf.config.experimental.list_physical_devices("GPU") else "NOT AVAILABLE")
Run the file and then the output should be something like this:
Version: 2.0.0
Eager mode: True
Hub Version: 0.7.0
GPU is available
Hope this helps

What worked for me tho is this library https://pypi.org/project/silence-tensorflow/
Install this library and do as instructed on the page, it works like a charm!

Try using anaconda. I had the same error. One lone option was to build tensorflow from source which took long time. I tried using conda and it worked.
Create a new environment in anaconda.
conda install -c conda-forge tensorflow
Then, it worked.

He provided once the list, deleted by someone but see the answer is
Download packages list
Output:
F:\temp\Python>python test_tf_logics_.py
[0, 0, 26, 12, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0]
[ 0 0 0 26 12 0 0 0 2 0 0 0 0 0 0 0 0]
[ 0 0 26 12 0 0 0 2 0 0 0 0 0 0 0 0 0]
2022-03-23 15:47:05.516025: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-03-23 15:47:06.161476: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10 MB memory: -> device: 0, name: NVIDIA GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1
[0 0 2 0 0 0 0 7 0 0 0 0 0 0 0 0 0]
...

As the message says, your CPU supports instructions that TensorFlow binary was not compiled to use. This should not be an issue with CPU version of TensorFlow since it does not perform AVX (Advanced Vector Extensions) instructions.
However, it seems that TensorFlow uses AVX instructions in some parts of the code and the message is just a warning, you can safely ignore it.
You can compile your own version of TensorFlow with AVX instructions.

Issues with flask in Windows PowerShell

I use Windows PowerShell to install flask but when I execute python app.py I get this error
C:\Users\User\Desktop\flask\Deployed Model> python app.py
Using TensorFlow backend.
2020-06-01 12:36:45.548048: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
Model loaded. Start serving...
Can anyone help me with this error? I installed all the libraries in Windows PowerShell and I also installed tensorflow==2.0

Since I cannot comment I'm writing this answer instead.
It seems like you don't have a GPU installed on your PC and if its true then you need to compile tensorflow binary for use.
This warning is specific to your CPU and it tells that your CPU dosen't support AVX2.
But if you have a GPU then simply disable the warning using
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

Is it possible to run tensorflow-gpu on a computer without a GPU or CUDA?

I have two Windows computers, one with and one without a GPU.
I would like to deploy the same python script on both (TensorFlow 1.8 Object Detection), without changing the packaged version of TensorFlow. In other words, I want to run tensorflow-gpu on a CPU.
In the event where my script cannot detect nvcuda.dll, I've tried using a Session config to disable the GPU, like so:
config = tf.ConfigProto(
device_count = {'GPU': 0}
)
and:
with tf.Session(graph=detection_graph, config=config) as sess:
However, this is not sufficient, as TensorFlow still returns the error:
ImportError: Could not find 'nvcuda.dll'. TensorFlow requires that this DLL be installed in a directory that is named in your %PATH% environment variable.
Typically it is installed in 'C:\Windows\System32'. If it is not present, ensure that you have a CUDA-capable GPU with the correct driver installed.
Is there any way to disable checking for a GPU/CUDA entirely and default to CPU?
EDIT: I have read the year-old answer regarding tensorflow-gpu==1.0 on Linux posted here, which suggests this is impossible. I'm interested to know if this is still how tensorflow-gpu is compiled, 9 versions later.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.