How to get current available GPUs in tensorflow? - python

I have a plan to use distributed TensorFlow, and I saw TensorFlow can use GPUs for training and testing. In a cluster environment, each machine could have 0 or 1 or more GPUs, and I want to run my TensorFlow graph into GPUs on as many machines as possible.
I found that when running tf.Session() TensorFlow gives information about GPU in the log messages like below:
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
My question is how do I get information about current available GPU from TensorFlow? I can get loaded GPU information from the log, but I want to do it in a more sophisticated, programmatic way.
I also could restrict GPUs intentionally using the CUDA_VISIBLE_DEVICES environment variable, so I don't want to know a way of getting GPU information from OS kernel.
In short, I want a function like tf.get_available_gpus() that will return ['/gpu:0', '/gpu:1'] if there are two GPUs available in the machine. How can I implement this?

There is an undocumented method called device_lib.list_local_devices() that enables you to list the devices available in the local process. (N.B. As an undocumented method, this is subject to backwards incompatible changes.) The function returns a list of DeviceAttributes protocol buffer objects. You can extract a list of string device names for the GPU devices as follows:
from tensorflow.python.client import device_lib
def get_available_gpus():
local_device_protos = device_lib.list_local_devices()
return [x.name for x in local_device_protos if x.device_type == 'GPU']
Note that (at least up to TensorFlow 1.4), calling device_lib.list_local_devices() will run some initialization code that, by default, will allocate all of the GPU memory on all of the devices (GitHub issue). To avoid this, first create a session with an explicitly small per_process_gpu_fraction, or allow_growth=True, to prevent all of the memory being allocated. See this question for more details.

You can check all device list using following code:
from tensorflow.python.client import device_lib
device_lib.list_local_devices()

There is also a method in the test util.
So all that has to be done is:
tf.test.is_gpu_available()
and/or
tf.test.gpu_device_name()
Look up the Tensorflow docs for arguments.

Since TensorFlow 2.1, you can use tf.config.list_physical_devices('GPU'):
import tensorflow as tf
gpus = tf.config.list_physical_devices('GPU')
for gpu in gpus:
print("Name:", gpu.name, " Type:", gpu.device_type)
If you have two GPUs installed, it outputs this:
Name: /physical_device:GPU:0 Type: GPU
Name: /physical_device:GPU:1 Type: GPU
In TF 2.0, you must add experimental:
gpus = tf.config.experimental.list_physical_devices('GPU')
See:
Guide pages
Current API

The accepted answer gives you the number of GPUs but it also allocates all the memory on those GPUs. You can avoid this by creating a session with fixed lower memory before calling device_lib.list_local_devices() which may be unwanted for some applications.
I ended up using nvidia-smi to get the number of GPUs without allocating any memory on them.
import subprocess
n = str(subprocess.check_output(["nvidia-smi", "-L"])).count('UUID')

Apart from the excellent explanation by Mrry, where he suggested to use device_lib.list_local_devices() I can show you how you can check for GPU related information from the command line.
Because currently only Nvidia's gpus work for NN frameworks, the answer covers only them. Nvidia has a page where they document how you can use the /proc filesystem interface to obtain run-time information about the driver, any installed NVIDIA graphics cards, and the AGP status.
/proc/driver/nvidia/gpus/0..N/information
Provide information about
each of the installed NVIDIA graphics adapters (model name, IRQ, BIOS
version, Bus Type). Note that the BIOS version is only available while
X is running.
So you can run this from command line cat /proc/driver/nvidia/gpus/0/information and see information about your first GPU. It is easy to run this from python and also you can check second, third, fourth GPU till it will fail.
Definitely Mrry's answer is more robust and I am not sure whether my answer will work on non-linux machine, but that Nvidia's page provide other interesting information, which not many people know about.

The following works in tensorflow 2:
import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
print("Name:", gpu.name, " Type:", gpu.device_type)
From 2.1, you can drop experimental:
gpus = tf.config.list_physical_devices('GPU')
https://www.tensorflow.org/api_docs/python/tf/config/list_physical_devices

I got a GPU called NVIDIA GTX GeForce 1650 Ti in my machine with tensorflow-gpu==2.2.0
Run the following two lines of code:
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
Output:
Num GPUs Available: 1

In TensorFlow Core v2.3.0, the following code should work.
import tensorflow as tf
visible_devices = tf.config.get_visible_devices()
for devices in visible_devices:
print(devices)
Depending on your environment, this code will produce flowing results.
PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')
PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')

latest version recommended by tensorflow:
tf.config.list_physical_devices('GPU')

I am working on TF-2.1 and torch, so I don't want to specific this automacit choosing in any ML frame. I just use original nvidia-smi and os.environ to get a vacant gpu.
def auto_gpu_selection(usage_max=0.01, mem_max=0.05):
"""Auto set CUDA_VISIBLE_DEVICES for gpu
:param mem_max: max percentage of GPU utility
:param usage_max: max percentage of GPU memory
:return:
"""
os.environ['CUDA_DEVICE_ORDER'] = 'PCI_BUS_ID'
log = str(subprocess.check_output("nvidia-smi", shell=True)).split(r"\n")[6:-1]
gpu = 0
# Maximum of GPUS, 8 is enough for most
for i in range(8):
idx = i*3 + 2
if idx > log.__len__()-1:
break
inf = log[idx].split("|")
if inf.__len__() < 3:
break
usage = int(inf[3].split("%")[0].strip())
mem_now = int(str(inf[2].split("/")[0]).strip()[:-3])
mem_all = int(str(inf[2].split("/")[1]).strip()[:-3])
# print("GPU-%d : Usage:[%d%%]" % (gpu, usage))
if usage < 100*usage_max and mem_now < mem_max*mem_all:
os.environ["CUDA_VISIBLE_EVICES"] = str(gpu)
print("\nAuto choosing vacant GPU-%d : Memory:[%dMiB/%dMiB] , GPU-Util:[%d%%]\n" %
(gpu, mem_now, mem_all, usage))
return
print("GPU-%d is busy: Memory:[%dMiB/%dMiB] , GPU-Util:[%d%%]" %
(gpu, mem_now, mem_all, usage))
gpu += 1
print("\nNo vacant GPU, use CPU instead\n")
os.environ["CUDA_VISIBLE_EVICES"] = "-1"
If I can get any GPU, it will set CUDA_VISIBLE_EVICES to BUSID of that gpu :
GPU-0 is busy: Memory:[5738MiB/11019MiB] , GPU-Util:[60%]
GPU-1 is busy: Memory:[9688MiB/11019MiB] , GPU-Util:[78%]
Auto choosing vacant GPU-2 : Memory:[1MiB/11019MiB] , GPU-Util:[0%]
else, set to -1 to use CPU:
GPU-0 is busy: Memory:[8900MiB/11019MiB] , GPU-Util:[95%]
GPU-1 is busy: Memory:[4674MiB/11019MiB] , GPU-Util:[35%]
GPU-2 is busy: Memory:[9784MiB/11016MiB] , GPU-Util:[74%]
No vacant GPU, use CPU instead
Note: Use this function before you import any ML frame that require a GPU, then it can automatically choose a gpu. Besides, it's easy for you to set multiple tasks.

Use this way and check all parts :
from __future__ import absolute_import, division, print_function, unicode_literals
import numpy as np
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_datasets as tfds
version = tf.__version__
executing_eagerly = tf.executing_eagerly()
hub_version = hub.__version__
available = tf.config.experimental.list_physical_devices("GPU")
print("Version: ", version)
print("Eager mode: ", executing_eagerly)
print("Hub Version: ", h_version)
print("GPU is", "available" if avai else "NOT AVAILABLE")

Ensure you have the latest TensorFlow 2.x GPU installed in your GPU supporting machine,
Execute the following code in python,
from __future__ import absolute_import, division, print_function, unicode_literals
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
Will get an output looks like,
2020-02-07 10:45:37.587838: I
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful
NUMA node read from SysFS had negative value (-1), but there must be
at least one NUMA node, so returning NUMA node zero 2020-02-07
10:45:37.588896: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible
gpu devices: 0, 1, 2, 3, 4, 5, 6, 7 Num GPUs Available: 8

Run the following in any shell
python -c "import tensorflow as tf; print(\"Num GPUs Available: \", len(tf.config.list_physical_devices('GPU')))"

You can use the following code fields to show device name, type, memory and locality.
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

Related

How to get CPU and Memory usages while learning using Tensorflow 2.X Keras fit()

I am testing my system for TensorFlow keras machine learning.
This is my testing environments below.
os : windows10 pro
tensorflow 2.3
python 3.7
And here is my code
[method for cpu and memory usages]
import os
import psutil
def check_cpu_mem():
cpu_usage = psutil.cpu_percent()
memory_usage = psutil.virtual_memory()
print("- cpu usage : ", cpu_usage, "%")
print("- memory usage : ", memory_usage, "%")
[call method]
model = get_model(train_input.shape[1], train_input.shape[2], 11)
history = model.fit(train_input, train_output, batch_size=BATCH_SIZE, epochs=EPOCHS)
check_cpu_mem()
I call check_cpu_mem() after fit.
I think it's not exact information.
How can I get exact usages percentage of CPU and memory while it's running?
I found the way for Linux but not Windows.

Tensorflow 2: how to switch execution from GPU to CPU and back?

In tensorflow 1.X with standalone keras 2.X, I used to switch between training on GPU, and running inference on CPU (much faster for some reason for my RNN models) with the following snippet:
keras.backend.clear_session()
def set_session(gpus: int = 0):
num_cores = cpu_count()
config = tf.ConfigProto(
intra_op_parallelism_threads=num_cores,
inter_op_parallelism_threads=num_cores,
allow_soft_placement=True,
device_count={"CPU": 1, "GPU": gpus},
)
session = tf.Session(config=config)
k.set_session(session)
This ConfigProto functionality is no longer available in tensorflow 2.0 (there I'm using the integrated tensorflow.keras). In the beginning, it is possible to run tf.config.experimental.set_visible_devices() in order to e.g. disable the GPU, but any subsequent calls to set_visible_devices result in RuntimeError: Visible devices cannot be modified after being initialized. Is there a way of re-initializing the visible devices or is there another way of switching the devices available?
You can use tf.device to explicitly set which device you want to use. For example:
import tensorflow as tf
model = tf.keras.Model(...)
# Run training on GPU
with tf.device('/gpu:0'):
model.fit(...)
# Run inference on CPU
with tf.device('/cpu:0'):
model.predict(...)
If you only have one CPU and one GPU, the names used above should work. Otherwise, device_lib.list_local_devices() can give you a list of your devices. This post gives a nice function for listing just the names, which I adapt here to also show CPUs:
from tensorflow.python.client import device_lib
def get_available_devices():
local_device_protos = device_lib.list_local_devices()
return [x.name for x in local_device_protos if x.device_type == 'GPU' or x.device_type == 'CPU']
Does using tf.device can help you?
With that, you can set some operations either on CPU or on GPU.
I would just restart the kernel, this worked for me

Disable Tensorflow debugging information

By debugging information I mean what TensorFlow shows in my terminal about loaded libraries and found devices etc. not Python errors.
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:900] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
name: Graphics Device
major: 5 minor: 2 memoryClockRate (GHz) 1.0885
pciBusID 0000:04:00.0
Total memory: 12.00GiB
Free memory: 11.83GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:717] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Graphics Device, pci bus id: 0000:04:00.0)
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 1.0KiB
...
You can disable all debugging logs using os.environ :
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import tensorflow as tf
Tested on tf 0.12 and 1.0
In details,
0 = all messages are logged (default behavior)
1 = INFO messages are not printed
2 = INFO and WARNING messages are not printed
3 = INFO, WARNING, and ERROR messages are not printed
2.0 Update (10/8/19)
Setting TF_CPP_MIN_LOG_LEVEL should still work (see below in v0.12+ update), but there was a reported issue for version 2.0 until 2.3.z fixed in 2.4 and later. If setting TF_CPP_MIN_LOG_LEVEL does not work for you (again, see below), try doing the following to set the log level:
import tensorflow as tf
tf.get_logger().setLevel('INFO')
In addition, please see the documentation on tf.autograph.set_verbosity which sets the verbosity of autograph log messages - for example:
# Can also be set using the AUTOGRAPH_VERBOSITY environment variable
tf.autograph.set_verbosity(1)
v0.12+ Update (5/20/17), Working through TF 2.0+:
In TensorFlow 0.12+, per this issue, you can now control logging via the environmental variable called TF_CPP_MIN_LOG_LEVEL; it defaults to 0 (all logs shown) but can be set to one of the following values under the Level column.
Level | Level for Humans | Level Description
-------|------------------|------------------------------------
0 | INFO | [Default] Print all messages
1 | WARNING | Filter out INFO messages
2 | ERROR | Filter out INFO & WARNING messages
3 | NONE | Filter out all messages
See the following generic OS example using Python:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # or any {'0', '1', '2'}
import tensorflow as tf
You can set this environmental variable in the environment that you run your script in. For example, with bash this can be in the file ~/.bashrc, /etc/environment, /etc/profile, or in the actual shell as:
TF_CPP_MIN_LOG_LEVEL=2 python my_tf_script.py
To be thorough, you call also set the level for the Python tf_logging module, which is used in e.g. summary ops, tensorboard, various estimators, etc.
# append to lines above
tf.logging.set_verbosity(tf.logging.ERROR) # or any {DEBUG, INFO, WARN, ERROR, FATAL}
For 1.14 you will receive warnings if you do not change to use the v1 API as follows:
# append to lines above
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR) # or any {DEBUG, INFO, WARN, ERROR, FATAL}
**For Prior Versions of TensorFlow or TF-Learn Logging (v0.11.x or lower):**
View the page below for information on TensorFlow logging; with the new update, you're able to set the logging verbosity to either DEBUG, INFO, WARN, ERROR, or FATAL. For example:
tf.logging.set_verbosity(tf.logging.ERROR)
The page additionally goes over monitors which can be used with TF-Learn models. Here is the page.
This doesn't block all logging, though (only TF-Learn). I have two solutions; one is a 'technically correct' solution (Linux) and the other involves rebuilding TensorFlow.
script -c 'python [FILENAME].py' | grep -v 'I tensorflow/'
For the other, please see this answer which involves modifying source and rebuilding TensorFlow.
For compatibility with Tensorflow 2.0, you can use tf.get_logger
import logging
tf.get_logger().setLevel(logging.ERROR)
I have had this problem as well (on tensorflow-0.10.0rc0), but could not fix the excessive nose tests logging problem via the suggested answers.
I managed to solve this by probing directly into the tensorflow logger. Not the most correct of fixes, but works great and only pollutes the test files which directly or indirectly import tensorflow:
# Place this before directly or indirectly importing tensorflow
import logging
logging.getLogger("tensorflow").setLevel(logging.WARNING)
To anyone still struggling to get the os.environ solution to work as I was, check that this is placed before you import tensorflow in your script, just like mwweb's answer:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # or any {'0', '1', '2'}
import tensorflow as tf
I solved with this post Cannot remove all warnings #27045 , and the solution was:
import logging
logging.getLogger('tensorflow').disabled = True
I am using Tensorflow version 2.3.1 and none of the solutions above have been fully effective.
Until, I find this package.
Install like this:
with Anaconda,
python -m pip install silence-tensorflow
with IDEs,
pip install silence-tensorflow
And add to the first line of code:
from silence_tensorflow import silence_tensorflow
silence_tensorflow()
That's It!
As TF_CPP_MIN_LOG_LEVEL didn't work for me you can try:
tf.logging.set_verbosity(tf.logging.WARN)
Worked for me in tensorflow v1.6.0
Usual python3 log manager works for me with tensorflow==1.11.0:
import logging
logging.getLogger('tensorflow').setLevel(logging.INFO)
for tensorflow 2.1.0, following code works fine.
import tensorflow as tf
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
To add some flexibility here, you can achieve more fine-grained control over the level of logging by writing a function that filters out messages however you like:
logging.getLogger('tensorflow').addFilter(my_filter_func)
where my_filter_func accepts a LogRecord object as input [LogRecord docs] and
returns zero if you want the message thrown out; nonzero otherwise.
Here's an example filter that only keeps every nth info message (Python 3 due
to the use of nonlocal here):
def keep_every_nth_info(n):
i = -1
def filter_record(record):
nonlocal i
i += 1
return int(record.levelname != 'INFO' or i % n == 0)
return filter_record
# Example usage for TensorFlow:
logging.getLogger('tensorflow').addFilter(keep_every_nth_info(5))
All of the above has assumed that TensorFlow has set up its logging state already. You can ensure this without side effects by calling tf.logging.get_verbosity() before adding a filter.
Yeah, I'm using tf 2.0-beta and want to enable/disable the default logging. The environment variable and methods in tf1.X don't seem to exist anymore.
I stepped around in PDB and found this to work:
# close the TF2 logger
tf2logger = tf.get_logger()
tf2logger.error('Close TF2 logger handlers')
tf2logger.root.removeHandler(tf2logger.root.handlers[0])
I then add my own logger API (in this case file-based)
logtf = logging.getLogger('DST')
logtf.setLevel(logging.DEBUG)
# file handler
logfile='/tmp/tf_s.log'
fh = logging.FileHandler(logfile)
fh.setFormatter( logging.Formatter('fh %(asctime)s %(name)s %(filename)s:%(lineno)d :%(message)s') )
logtf.addHandler(fh)
logtf.info('writing to %s', logfile)
I was struggling from this for a while, tried almost all the solutions here but could not get rid of debugging info in TF 1.14, I have tried following multiple solutions:
import os
import logging
import sys
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # FATAL
stderr = sys.stderr
sys.stderr = open(os.devnull, 'w')
import tensorflow as tf
tf.get_logger().setLevel(tf.compat.v1.logging.FATAL)
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
logging.getLogger('tensorflow').setLevel(tf.compat.v1.logging.FATAL)
sys.stderr = stderr
import absl.logging
logging.root.removeHandler(absl.logging._absl_handler)
absl.logging._warn_preinit_stderr = False
The debugging info still showed up, what finally helped was restarting my pc (actually restarting the kernel should work). So if somebody has similar problem, try restart kernel after you set your environment vars, simple but might not come in mind.
If you only need to get rid of warning outputs on the screen, you might want to clear the console screen right after importing the tensorflow by using this simple command (Its more effective than disabling all debugging logs in my experience):
In windows:
import os
os.system('cls')
In Linux or Mac:
import os
os.system('clear')
None of the solutions above could solve my problem in Jupyter Notebook, so I use the following snippet code bellow from Cicoria, and issues solved.
import warnings
with warnings.catch_warnings():
warnings.filterwarnings("ignore",category=FutureWarning)
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.preprocessing.text import Tokenizer
print('Done')
Most of the answers here work, but you have to use them every time you open a new session (e.g. with JupyterLab). To make the changes stick, you have to set the environment variable.
Linux:
export TF_CPP_MIN_LOG_LEVEL="3"
(Also add the above line to .bashrc to make the change permanent, not just for the session)
Windows:
setx TF_CPP_MIN_LOG_LEVEL "3"
Both set the environment variables for the user.
After testing various suggestions so that they could also silence the resulting executable built with PyInstaller, I came up with this setting:
import logging
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
logging.getLogger('tensorflow').setLevel(logging.ERROR)
import tensorflow as tf
The line
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
will silent the warning about rebuilding TensorFlow:
I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA.
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
The line
logging.getLogger('tensorflow').setLevel(logging.ERROR)
will silent the warning about AutoGraph:
WARNING:tensorflow:AutoGraph is not available in this environment: functions lack code information. This is typical of some environments like the interactive Python shell. See https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/autograph/g3doc/reference/limitations.md#access-to-source-code for more information.
The key point is to place these two before importing Tensorflow—despite Pylint's warning!
tensorflow 2.11.0
In Jupyter notebooks, you can use the %env magic command:
%env TF_CPP_MIN_LOG_LEVEL=3
import tensorflow as tf

Why does the floatX's flag impact whether GPU is used in Theano?

I am testing Theano with GPU using the script provided in the tutorial for that purpose:
# Start gpu_test.py
# From http://deeplearning.net/software/theano/tutorial/using_gpu.html#using-gpu
from theano import function, config, shared, sandbox
import theano.tensor as T
import numpy
import time
vlen = 10 * 30 * 768 # 10 x #cores x # threads per core
iters = 1000
rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], T.exp(x))
print(f.maker.fgraph.toposort())
t0 = time.time()
for i in xrange(iters):
r = f()
t1 = time.time()
print("Looping %d times took %f seconds" % (iters, t1 - t0))
print("Result is %s" % (r,))
if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):
print('Used the cpu')
else:
print('Used the gpu')
# End gpu_test.py
If I specify floatX=float32, it runs on GPU:
francky#here:/fun$ THEANO_FLAGS='mode=FAST_RUN,device=gpu2,floatX=float32' python gpu_test.py
Using gpu device 2: GeForce GTX TITAN X (CNMeM is disabled)
[GpuElemwise{exp,no_inplace}(<CudaNdarrayType(float32, vector)>), HostFromGpu(Gp
Looping 1000 times took 1.458473 seconds
Result is [ 1.23178029 1.61879349 1.52278066 ..., 2.20771813 2.29967761
1.62323296]
Used the gpu
If I do not specify floatX=float32, it runs on CPU:
francky#here:/fun$ THEANO_FLAGS='mode=FAST_RUN,device=gpu2'
Using gpu device 2: GeForce GTX TITAN X (CNMeM is disabled)
[Elemwise{exp,no_inplace}(<TensorType(float64, vector)>)]
Looping 1000 times took 3.086261 seconds
Result is [ 1.23178032 1.61879341 1.52278065 ..., 2.20771815 2.29967753
1.62323285]
Used the cpu
If I specify floatX=float64, it runs on CPU:
francky#here:/fun$ THEANO_FLAGS='mode=FAST_RUN,device=gpu2,floatX=float64' python gpu_test.py
Using gpu device 2: GeForce GTX TITAN X (CNMeM is disabled)
[Elemwise{exp,no_inplace}(<TensorType(float64, vector)>)]
Looping 1000 times took 3.148040 seconds
Result is [ 1.23178032 1.61879341 1.52278065 ..., 2.20771815 2.29967753
1.62323285]
Used the cpu
Why does the floatX flag impact whether GPU is used in Theano?
I use:
Theano 0.7.0 (according to pip freeze),
Python 2.7.6 64 bits (according to import platform; platform.architecture()),
Nvidia-smi 361.28 (according to nvidia-smi),
CUDA 7.5.17 (according to nvcc --version),
GeForce GTX Titan X (according to nvidia-smi),
Ubuntu 14.04.4 LTS x64 (according to lsb_release -a and uname -i).
I read the documentation on floatX but it didn't help. It simply says:
config.floatX String value: either ‘float64’ or ‘float32’
Default: ‘float64’
This sets the default dtype returned by tensor.matrix(),
tensor.vector(), and similar functions. It also sets the default
theano bit width for arguments passed as Python floating-point
numbers.
From http://deeplearning.net/software/theano/tutorial/using_gpu.html#gpuarray-backend I read that it is possible to perform float64 calculations on GPU, but you have to install the libgpuarray from source.
I managed to install it, see this script, I used virtualenv, you don't even have to have sudo.
After installation you can use the old backend with config flag device=gpu and the new backend with device=cuda.
The new backend can perform 64 bit calculations, but it works differently for me. Some operations stopped working. ABSOLUTELY NO WARRANTY, to the extent permitted by applicable law :)
As far as I know, it's because they haven't yet implemented float64 for GPUs.
http://deeplearning.net/software/theano/tutorial/using_gpu.html :
Only computations with float32 data-type can be accelerated. Better support for float64 is expected in upcoming hardware but float64 computations are still relatively slow (Jan 2010).

How can I run theano on GPU

If I run the following code with python 3.5
import numpy as np
import time
import theano
A = np.random.rand(1000,10000).astype(theano.config.floatX)
B = np.random.rand(10000,1000).astype(theano.config.floatX)
np_start = time.time()
AB = A.dot(B)
np_end = time.time()
X,Y = theano.tensor.matrices('XY')
mf = theano.function([X,Y],X.dot(Y))
t_start = time.time()
tAB = mf(A,B)
t_end = time.time()
print ("NP time: %f[s], theano time: %f[s] **(times should be close when run
on CPU!)**" %(np_end-np_start, t_end-t_start))
print ("Result difference: %f" % (np.abs(AB-tAB).max(), ))
I get the output
NP time: 0.161123[s], theano time: 0.167119[s] (times should be close when
run on CPU!)
Result difference: 0.000000
it says if the times are close, it means that I am running on my CPU.
How can I run this code on my GPU?
NOTE:
I have a workstation with Nvidia Quadro k4200.
I have installed Cuda toolkit
I have successfully worked an cuda vectorAdd sample project on VS2012.
You configure Theano to use a GPU by specifying the device=gpu in Theano's config. There are two principle methods for setting the config: (1) in the THEANO_FLAGS environment variable, or (2) via the .theanorc file. Both methods, and all of Theano's configuration flags, are documented.
You will know that Theano is using the GPU if, after calling import theano you see a message that looks something like this
Using gpu device 0: GeForce GT 640 (CNMeM is disabled)
The details may vary for you but if no message appears at all then Theano is using the CPU only.
Note also that even if you see the GPU message, your particular computation graph may not run on the GPU. To see which parts of your computation are running on the GPU print its compiled and optimized graph
f = theano.function(...)
theano.printing.debugprint(f)
Operations that start with the prefix 'Gpu' will run on the GPU. Operations that do not have that prefix to their name will run on the CPU.
If you are on Linux, create a .theanorc file in your home folder and add the following to set up theano to run on GPU.
[global]
device = gpu
floatx = float32
Alternatively, if you want to use the GPU programattically:
import theano.sandbox.cuda
theano.sandbox.cuda.use("gpu0")
You should see a message like this:
Using gpu device 0: Tesla K80
Useful if the environment you are running in isn't easy to configure.

Categories

Resources