Disable Tensorflow debugging information - python

By debugging information I mean what TensorFlow shows in my terminal about loaded libraries and found devices etc. not Python errors.
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:900] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
name: Graphics Device
major: 5 minor: 2 memoryClockRate (GHz) 1.0885
pciBusID 0000:04:00.0
Total memory: 12.00GiB
Free memory: 11.83GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:717] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Graphics Device, pci bus id: 0000:04:00.0)
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 1.0KiB
...

You can disable all debugging logs using os.environ :
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import tensorflow as tf
Tested on tf 0.12 and 1.0
In details,
0 = all messages are logged (default behavior)
1 = INFO messages are not printed
2 = INFO and WARNING messages are not printed
3 = INFO, WARNING, and ERROR messages are not printed

2.0 Update (10/8/19)
Setting TF_CPP_MIN_LOG_LEVEL should still work (see below in v0.12+ update), but there was a reported issue for version 2.0 until 2.3.z fixed in 2.4 and later. If setting TF_CPP_MIN_LOG_LEVEL does not work for you (again, see below), try doing the following to set the log level:
import tensorflow as tf
tf.get_logger().setLevel('INFO')
In addition, please see the documentation on tf.autograph.set_verbosity which sets the verbosity of autograph log messages - for example:
# Can also be set using the AUTOGRAPH_VERBOSITY environment variable
tf.autograph.set_verbosity(1)
v0.12+ Update (5/20/17), Working through TF 2.0+:
In TensorFlow 0.12+, per this issue, you can now control logging via the environmental variable called TF_CPP_MIN_LOG_LEVEL; it defaults to 0 (all logs shown) but can be set to one of the following values under the Level column.
Level | Level for Humans | Level Description
-------|------------------|------------------------------------
0 | INFO | [Default] Print all messages
1 | WARNING | Filter out INFO messages
2 | ERROR | Filter out INFO & WARNING messages
3 | NONE | Filter out all messages
See the following generic OS example using Python:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # or any {'0', '1', '2'}
import tensorflow as tf
You can set this environmental variable in the environment that you run your script in. For example, with bash this can be in the file ~/.bashrc, /etc/environment, /etc/profile, or in the actual shell as:
TF_CPP_MIN_LOG_LEVEL=2 python my_tf_script.py
To be thorough, you call also set the level for the Python tf_logging module, which is used in e.g. summary ops, tensorboard, various estimators, etc.
# append to lines above
tf.logging.set_verbosity(tf.logging.ERROR) # or any {DEBUG, INFO, WARN, ERROR, FATAL}
For 1.14 you will receive warnings if you do not change to use the v1 API as follows:
# append to lines above
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR) # or any {DEBUG, INFO, WARN, ERROR, FATAL}
**For Prior Versions of TensorFlow or TF-Learn Logging (v0.11.x or lower):**
View the page below for information on TensorFlow logging; with the new update, you're able to set the logging verbosity to either DEBUG, INFO, WARN, ERROR, or FATAL. For example:
tf.logging.set_verbosity(tf.logging.ERROR)
The page additionally goes over monitors which can be used with TF-Learn models. Here is the page.
This doesn't block all logging, though (only TF-Learn). I have two solutions; one is a 'technically correct' solution (Linux) and the other involves rebuilding TensorFlow.
script -c 'python [FILENAME].py' | grep -v 'I tensorflow/'
For the other, please see this answer which involves modifying source and rebuilding TensorFlow.

For compatibility with Tensorflow 2.0, you can use tf.get_logger
import logging
tf.get_logger().setLevel(logging.ERROR)

I have had this problem as well (on tensorflow-0.10.0rc0), but could not fix the excessive nose tests logging problem via the suggested answers.
I managed to solve this by probing directly into the tensorflow logger. Not the most correct of fixes, but works great and only pollutes the test files which directly or indirectly import tensorflow:
# Place this before directly or indirectly importing tensorflow
import logging
logging.getLogger("tensorflow").setLevel(logging.WARNING)

To anyone still struggling to get the os.environ solution to work as I was, check that this is placed before you import tensorflow in your script, just like mwweb's answer:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # or any {'0', '1', '2'}
import tensorflow as tf

I solved with this post Cannot remove all warnings #27045 , and the solution was:
import logging
logging.getLogger('tensorflow').disabled = True

I am using Tensorflow version 2.3.1 and none of the solutions above have been fully effective.
Until, I find this package.
Install like this:
with Anaconda,
python -m pip install silence-tensorflow
with IDEs,
pip install silence-tensorflow
And add to the first line of code:
from silence_tensorflow import silence_tensorflow
silence_tensorflow()
That's It!

As TF_CPP_MIN_LOG_LEVEL didn't work for me you can try:
tf.logging.set_verbosity(tf.logging.WARN)
Worked for me in tensorflow v1.6.0

Usual python3 log manager works for me with tensorflow==1.11.0:
import logging
logging.getLogger('tensorflow').setLevel(logging.INFO)

for tensorflow 2.1.0, following code works fine.
import tensorflow as tf
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)

To add some flexibility here, you can achieve more fine-grained control over the level of logging by writing a function that filters out messages however you like:
logging.getLogger('tensorflow').addFilter(my_filter_func)
where my_filter_func accepts a LogRecord object as input [LogRecord docs] and
returns zero if you want the message thrown out; nonzero otherwise.
Here's an example filter that only keeps every nth info message (Python 3 due
to the use of nonlocal here):
def keep_every_nth_info(n):
i = -1
def filter_record(record):
nonlocal i
i += 1
return int(record.levelname != 'INFO' or i % n == 0)
return filter_record
# Example usage for TensorFlow:
logging.getLogger('tensorflow').addFilter(keep_every_nth_info(5))
All of the above has assumed that TensorFlow has set up its logging state already. You can ensure this without side effects by calling tf.logging.get_verbosity() before adding a filter.

Yeah, I'm using tf 2.0-beta and want to enable/disable the default logging. The environment variable and methods in tf1.X don't seem to exist anymore.
I stepped around in PDB and found this to work:
# close the TF2 logger
tf2logger = tf.get_logger()
tf2logger.error('Close TF2 logger handlers')
tf2logger.root.removeHandler(tf2logger.root.handlers[0])
I then add my own logger API (in this case file-based)
logtf = logging.getLogger('DST')
logtf.setLevel(logging.DEBUG)
# file handler
logfile='/tmp/tf_s.log'
fh = logging.FileHandler(logfile)
fh.setFormatter( logging.Formatter('fh %(asctime)s %(name)s %(filename)s:%(lineno)d :%(message)s') )
logtf.addHandler(fh)
logtf.info('writing to %s', logfile)

I was struggling from this for a while, tried almost all the solutions here but could not get rid of debugging info in TF 1.14, I have tried following multiple solutions:
import os
import logging
import sys
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # FATAL
stderr = sys.stderr
sys.stderr = open(os.devnull, 'w')
import tensorflow as tf
tf.get_logger().setLevel(tf.compat.v1.logging.FATAL)
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
logging.getLogger('tensorflow').setLevel(tf.compat.v1.logging.FATAL)
sys.stderr = stderr
import absl.logging
logging.root.removeHandler(absl.logging._absl_handler)
absl.logging._warn_preinit_stderr = False
The debugging info still showed up, what finally helped was restarting my pc (actually restarting the kernel should work). So if somebody has similar problem, try restart kernel after you set your environment vars, simple but might not come in mind.

If you only need to get rid of warning outputs on the screen, you might want to clear the console screen right after importing the tensorflow by using this simple command (Its more effective than disabling all debugging logs in my experience):
In windows:
import os
os.system('cls')
In Linux or Mac:
import os
os.system('clear')

None of the solutions above could solve my problem in Jupyter Notebook, so I use the following snippet code bellow from Cicoria, and issues solved.
import warnings
with warnings.catch_warnings():
warnings.filterwarnings("ignore",category=FutureWarning)
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.preprocessing.text import Tokenizer
print('Done')

Most of the answers here work, but you have to use them every time you open a new session (e.g. with JupyterLab). To make the changes stick, you have to set the environment variable.
Linux:
export TF_CPP_MIN_LOG_LEVEL="3"
(Also add the above line to .bashrc to make the change permanent, not just for the session)
Windows:
setx TF_CPP_MIN_LOG_LEVEL "3"
Both set the environment variables for the user.

After testing various suggestions so that they could also silence the resulting executable built with PyInstaller, I came up with this setting:
import logging
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
logging.getLogger('tensorflow').setLevel(logging.ERROR)
import tensorflow as tf
The line
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
will silent the warning about rebuilding TensorFlow:
I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA.
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
The line
logging.getLogger('tensorflow').setLevel(logging.ERROR)
will silent the warning about AutoGraph:
WARNING:tensorflow:AutoGraph is not available in this environment: functions lack code information. This is typical of some environments like the interactive Python shell. See https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/autograph/g3doc/reference/limitations.md#access-to-source-code for more information.
The key point is to place these two before importing Tensorflow—despite Pylint's warning!
tensorflow 2.11.0

In Jupyter notebooks, you can use the %env magic command:
%env TF_CPP_MIN_LOG_LEVEL=3
import tensorflow as tf

Related

Numba printing information regarding Nvidia driver to python console when using its Cuda module. How to suppress this?

I have found that Numba is printing information regarding my Nvidia driver to my python console when using its Cuda module. For example, when using
numba.cuda.to_device(SOME_ARRAY)
for the first time,
INFO - 2020-12-21 19:16:22,163 - driver - init
is printed in red to my console. When using it any other time,
INFO - DATE TIME - driver - add pending dealloc: cuMemFree_v2
NUM_BYTES bytes
is printed. For example:
INFO - 2020-12-21 19:18:34,473 - driver - add pending dealloc:
cuMemFree_v2 729120 bytes
Is there any way in which I could prevent Numba from printing this information?
_ _
As recommended in the comments, I have already tried setting
sys.stdout = open(os.devnull, "w")
sys.stderr = open(os.devnull, "w")
before running numba.cuda.to_device(), but this did not solve the issue.
I have also looked into Numba's environmental variables and have found nothing that solves the issue. To be specific, I have tried setting
os.environ['NUMBA_CUDA_LOG_LEVEL'] = str(logging.NOTSET)
and
os.environ['NUMBA_CUDA_LOG_LEVEL'] = str(logging.DEBUG)
And none of the two solved the issue. I have also tried setting
numba.config.CUDA_LOG_LEVEL = logging.NOTSET
and
numba.config.CUDA_LOG_LEVEL = logging.DEBUG
but once again this did not solve the issue.
After some time, I have found a solution to this problem. As hinted by talonmies in the comments above, these messages come from the Numba logger. On my machine, the logging level was set to "INFO" and Numba was printing directly to the console. To suppress the printing of information, add the following to your python program:
import logging
numba_logger = logging.getLogger('numba')
numba_logger.setLevel(logging.WARNING) # Or any other desired level that has a value above that of "INFO".

I have aberrant tensorflow output (with history) after simple functions? [duplicate]

anyone knows if there is a method to prevent tensorflow from polluting standard error with gpus' memory allocation log?.
I noted that when the following command is executed:
with tf.Session() as sess:
tensorflow prints on standard error a log about memory and gpu resources allocation. Something like:
I tensorflow/core/common_runtime/local_device.cc:25] Local device intra op parallelism threads: 48
Graphics Device pciBusID 0000:02:00.0
Free memory: 11.75GiB
...
For important reasons, I wanna avoid this printing.
This was recently fixed, and should be available if you upgrade to TensorFlow 0.12 or later.
To disable all logging output from TensorFlow, set the following environment variable before launching Python:
$ export TF_CPP_MIN_LOG_LEVEL=3
$ python ...
You can also adjust the verbosity by changing the value of TF_CPP_MIN_LOG_LEVEL:
0 = all messages are logged (default behavior)
1 = INFO messages are not printed
2 = INFO and WARNING messages are not printed
3 = INFO, WARNING, and ERROR messages are not printed
You can set an environment variable before launching Python as described in the first answer, or you can add the following lines to your Python code:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
Change 3 to values (0, 1, 2, 3) according to the messages you want avoid.
If using TensorFlow 2.0+, make sure to put those lines before import tensorflow to be effective.
Defaults to 0, so all logs are shown. Set TF_CPP_MIN_LOG_LEVEL to 1 to filter out INFO logs, 2 to additionall filter out WARNING, 3 to additionally filter out ERROR.
In TensorFlow 2.0, this does not work for all logging messages (I still get tf.function retracing warnings, for instance).
To make sure everything is turned off, do this
import os
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "2"
import logging
import tensorflow as tf
logger = tf.get_logger()
logger.setLevel(logging.ERROR) # or logging.INFO, logging.WARNING, etc.
Note 1: If you leave out the os.environ["TF_CPP_MIN_LOG_LEVEL"] line, some info messages are still printed (like the This TensorFlow binary is optimized with... that runs on startup.)
Note 2: in TF 1.0 you can also manually set the verbosity with tf.logging.set_verbosity(tf.logging.ERROR) but TF 2.0 does not have tf.logging.

Is there a way to suppress the messages TensorFlow prints?

I think that those messages are really important for the first few times but then it is just useless.
It is actually making things worse to read and debug.
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened
CUDA library libcublas.so.8.0 locally I
tensorflow/stream_executor/dso_loader.cc:119] Couldn't open CUDA
library libcudnn.so. LD_LIBRARY_PATH: I
tensorflow/stream_executor/cuda/cuda_dnn.cc:3459] Unable to load cuDNN
DSO I tensorflow/stream_executor/dso_loader.cc:128] successfully
opened CUDA library libcufft.so.8.0 locally I
tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA
library libcuda.so.1 locally I
tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA
library libcurand.so.8.0 locally
Is there a way to suppress the ones that just say it was successful?
UPDATE (beyond 1.14): see my more thorough answer here (this is a dupe question anyway): https://stackoverflow.com/a/38645250/6557588
In addition to Wintro's answer, you can also disable/suppress TensorFlow logs from the C side (i.e. the uglier ones starting with single characters: I, E, etc.); the issue open regarding logging has been updated to state that you can now control logging via an environmental variable. You can now change the level by setting the environmental variable called TF_CPP_MIN_LOG_LEVEL; it defaults to 0 (all logs shown), but can be set to 1 to filter out INFO logs, 2 to additionally filter out WARNING logs, and 3 to additionally filter out ERROR logs. It appears to be in master now, and will likely be a part of future version (i.e. versions after r0.11). See this page for more information. Here is an example of changing the verbosity using Python:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # or any {'0', '1', '2'}
import tensorflow as tf
You can set this environmental variable in the environment that you run your script in. For example, with bash this can be in the file ~/.bashrc, /etc/environment, /etc/profile, or in the actual shell as:
TF_CPP_MIN_LOG_LEVEL=2 python my_tf_script.py
You can set the verbosity levels of TensorFlow's logging using
tf.logging.set_verbosity(tf.logging.ERROR)
where ERROR can be any of DEBUG, INFO, WARN, ERROR, or FATAL. See the logging module.
However, setting this to ERROR does not always completely block all INFO logs, to completely block them you have two main choices in my opinion.
If you are using Linux, you can just grep out all output strings beginning with I tensorflow/.
Otherwise, you can completely rebuild TensorFlow with some modified files. See this answer.
If you're using TensorFlow version 1 (1.X), you can use
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
As of Tensorflow v1.14 (yes, including version 2.x) you can use the native logging module to silence Tensorflow:
import logging
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # FATAL
logging.getLogger('tensorflow').setLevel(logging.FATAL)
I personally use this in my projects:
def set_tf_loglevel(level):
if level >= logging.FATAL:
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
if level >= logging.ERROR:
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
if level >= logging.WARNING:
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '1'
else:
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '0'
logging.getLogger('tensorflow').setLevel(level)
so that I can disable tf logging by running:
set_tf_loglevel(logging.FATAL)
and I can re-enable with
set_tf_loglevel(logging.INFO)
I created a function which shuts TF up. I call it on start of my programs. Some messages are very annoying and I cannot do anything about them...
def tensorflow_shutup():
"""
Make Tensorflow less verbose
"""
try:
# noinspection PyPackageRequirements
import os
from tensorflow import logging
logging.set_verbosity(logging.ERROR)
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
# Monkey patching deprecation utils to shut it up! Maybe good idea to disable this once after upgrade
# noinspection PyUnusedLocal
def deprecated(date, instructions, warn_once=True):
def deprecated_wrapper(func):
return func
return deprecated_wrapper
from tensorflow.python.util import deprecation
deprecation.deprecated = deprecated
except ImportError:
pass
Edit: This is for TF 2.0 and up:
def tensorflow_shutup():
"""
Make Tensorflow less verbose
"""
try:
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
# noinspection PyPackageRequirements
import tensorflow as tf
from tensorflow.python.util import deprecation
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
# Monkey patching deprecation utils to shut it up! Maybe good idea to disable this once after upgrade
# noinspection PyUnusedLocal
def deprecated(date, instructions, warn_once=True): # pylint: disable=unused-argument
def deprecated_wrapper(func):
return func
return deprecated_wrapper
deprecation.deprecated = deprecated
except ImportError:
pass
To anyone still struggling to get the os.environ solution to work as I was, check that this is placed before you import tensorflow in your script, just like craymichael's answer:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # or any {'0', '1', '2'}
import tensorflow as tf
This is what worked for me:
import logging
logging.getLogger('tensorflow').setLevel(logging.ERROR)
os.environ["KMP_AFFINITY"] = "noverbose"
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import tensorflow as tf
tf.autograph.set_verbosity(3)
Considering previous answers, in Tensorflow 1.14 is actually possible to eliminate all the message generated by the library by including in the code the following two lines:
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
This method exploits both the environment variable approach and Tensorflow logging module.
Note: the compatibility version is necessary to avoids further warnings given by the library, since the standard one is now deprecated.
For tensorflow 2.2 you can disable the logging with the following lines:
import tensorflow as tf
tf.get_logger().setLevel('ERROR')
This works for me without introducing any packages other than TensorFlow itself:
import tensorflow as tf
tf.autograph.set_verbosity(0) # "0" means no logging.
For more details, check this TensorFlow API documentation.

How to get current available GPUs in tensorflow?

I have a plan to use distributed TensorFlow, and I saw TensorFlow can use GPUs for training and testing. In a cluster environment, each machine could have 0 or 1 or more GPUs, and I want to run my TensorFlow graph into GPUs on as many machines as possible.
I found that when running tf.Session() TensorFlow gives information about GPU in the log messages like below:
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
My question is how do I get information about current available GPU from TensorFlow? I can get loaded GPU information from the log, but I want to do it in a more sophisticated, programmatic way.
I also could restrict GPUs intentionally using the CUDA_VISIBLE_DEVICES environment variable, so I don't want to know a way of getting GPU information from OS kernel.
In short, I want a function like tf.get_available_gpus() that will return ['/gpu:0', '/gpu:1'] if there are two GPUs available in the machine. How can I implement this?
There is an undocumented method called device_lib.list_local_devices() that enables you to list the devices available in the local process. (N.B. As an undocumented method, this is subject to backwards incompatible changes.) The function returns a list of DeviceAttributes protocol buffer objects. You can extract a list of string device names for the GPU devices as follows:
from tensorflow.python.client import device_lib
def get_available_gpus():
local_device_protos = device_lib.list_local_devices()
return [x.name for x in local_device_protos if x.device_type == 'GPU']
Note that (at least up to TensorFlow 1.4), calling device_lib.list_local_devices() will run some initialization code that, by default, will allocate all of the GPU memory on all of the devices (GitHub issue). To avoid this, first create a session with an explicitly small per_process_gpu_fraction, or allow_growth=True, to prevent all of the memory being allocated. See this question for more details.
You can check all device list using following code:
from tensorflow.python.client import device_lib
device_lib.list_local_devices()
There is also a method in the test util.
So all that has to be done is:
tf.test.is_gpu_available()
and/or
tf.test.gpu_device_name()
Look up the Tensorflow docs for arguments.
Since TensorFlow 2.1, you can use tf.config.list_physical_devices('GPU'):
import tensorflow as tf
gpus = tf.config.list_physical_devices('GPU')
for gpu in gpus:
print("Name:", gpu.name, " Type:", gpu.device_type)
If you have two GPUs installed, it outputs this:
Name: /physical_device:GPU:0 Type: GPU
Name: /physical_device:GPU:1 Type: GPU
In TF 2.0, you must add experimental:
gpus = tf.config.experimental.list_physical_devices('GPU')
See:
Guide pages
Current API
The accepted answer gives you the number of GPUs but it also allocates all the memory on those GPUs. You can avoid this by creating a session with fixed lower memory before calling device_lib.list_local_devices() which may be unwanted for some applications.
I ended up using nvidia-smi to get the number of GPUs without allocating any memory on them.
import subprocess
n = str(subprocess.check_output(["nvidia-smi", "-L"])).count('UUID')
Apart from the excellent explanation by Mrry, where he suggested to use device_lib.list_local_devices() I can show you how you can check for GPU related information from the command line.
Because currently only Nvidia's gpus work for NN frameworks, the answer covers only them. Nvidia has a page where they document how you can use the /proc filesystem interface to obtain run-time information about the driver, any installed NVIDIA graphics cards, and the AGP status.
/proc/driver/nvidia/gpus/0..N/information
Provide information about
each of the installed NVIDIA graphics adapters (model name, IRQ, BIOS
version, Bus Type). Note that the BIOS version is only available while
X is running.
So you can run this from command line cat /proc/driver/nvidia/gpus/0/information and see information about your first GPU. It is easy to run this from python and also you can check second, third, fourth GPU till it will fail.
Definitely Mrry's answer is more robust and I am not sure whether my answer will work on non-linux machine, but that Nvidia's page provide other interesting information, which not many people know about.
The following works in tensorflow 2:
import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
print("Name:", gpu.name, " Type:", gpu.device_type)
From 2.1, you can drop experimental:
gpus = tf.config.list_physical_devices('GPU')
https://www.tensorflow.org/api_docs/python/tf/config/list_physical_devices
I got a GPU called NVIDIA GTX GeForce 1650 Ti in my machine with tensorflow-gpu==2.2.0
Run the following two lines of code:
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
Output:
Num GPUs Available: 1
In TensorFlow Core v2.3.0, the following code should work.
import tensorflow as tf
visible_devices = tf.config.get_visible_devices()
for devices in visible_devices:
print(devices)
Depending on your environment, this code will produce flowing results.
PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')
PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')
latest version recommended by tensorflow:
tf.config.list_physical_devices('GPU')
I am working on TF-2.1 and torch, so I don't want to specific this automacit choosing in any ML frame. I just use original nvidia-smi and os.environ to get a vacant gpu.
def auto_gpu_selection(usage_max=0.01, mem_max=0.05):
"""Auto set CUDA_VISIBLE_DEVICES for gpu
:param mem_max: max percentage of GPU utility
:param usage_max: max percentage of GPU memory
:return:
"""
os.environ['CUDA_DEVICE_ORDER'] = 'PCI_BUS_ID'
log = str(subprocess.check_output("nvidia-smi", shell=True)).split(r"\n")[6:-1]
gpu = 0
# Maximum of GPUS, 8 is enough for most
for i in range(8):
idx = i*3 + 2
if idx > log.__len__()-1:
break
inf = log[idx].split("|")
if inf.__len__() < 3:
break
usage = int(inf[3].split("%")[0].strip())
mem_now = int(str(inf[2].split("/")[0]).strip()[:-3])
mem_all = int(str(inf[2].split("/")[1]).strip()[:-3])
# print("GPU-%d : Usage:[%d%%]" % (gpu, usage))
if usage < 100*usage_max and mem_now < mem_max*mem_all:
os.environ["CUDA_VISIBLE_EVICES"] = str(gpu)
print("\nAuto choosing vacant GPU-%d : Memory:[%dMiB/%dMiB] , GPU-Util:[%d%%]\n" %
(gpu, mem_now, mem_all, usage))
return
print("GPU-%d is busy: Memory:[%dMiB/%dMiB] , GPU-Util:[%d%%]" %
(gpu, mem_now, mem_all, usage))
gpu += 1
print("\nNo vacant GPU, use CPU instead\n")
os.environ["CUDA_VISIBLE_EVICES"] = "-1"
If I can get any GPU, it will set CUDA_VISIBLE_EVICES to BUSID of that gpu :
GPU-0 is busy: Memory:[5738MiB/11019MiB] , GPU-Util:[60%]
GPU-1 is busy: Memory:[9688MiB/11019MiB] , GPU-Util:[78%]
Auto choosing vacant GPU-2 : Memory:[1MiB/11019MiB] , GPU-Util:[0%]
else, set to -1 to use CPU:
GPU-0 is busy: Memory:[8900MiB/11019MiB] , GPU-Util:[95%]
GPU-1 is busy: Memory:[4674MiB/11019MiB] , GPU-Util:[35%]
GPU-2 is busy: Memory:[9784MiB/11016MiB] , GPU-Util:[74%]
No vacant GPU, use CPU instead
Note: Use this function before you import any ML frame that require a GPU, then it can automatically choose a gpu. Besides, it's easy for you to set multiple tasks.
Use this way and check all parts :
from __future__ import absolute_import, division, print_function, unicode_literals
import numpy as np
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_datasets as tfds
version = tf.__version__
executing_eagerly = tf.executing_eagerly()
hub_version = hub.__version__
available = tf.config.experimental.list_physical_devices("GPU")
print("Version: ", version)
print("Eager mode: ", executing_eagerly)
print("Hub Version: ", h_version)
print("GPU is", "available" if avai else "NOT AVAILABLE")
Ensure you have the latest TensorFlow 2.x GPU installed in your GPU supporting machine,
Execute the following code in python,
from __future__ import absolute_import, division, print_function, unicode_literals
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
Will get an output looks like,
2020-02-07 10:45:37.587838: I
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful
NUMA node read from SysFS had negative value (-1), but there must be
at least one NUMA node, so returning NUMA node zero 2020-02-07
10:45:37.588896: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible
gpu devices: 0, 1, 2, 3, 4, 5, 6, 7 Num GPUs Available: 8
Run the following in any shell
python -c "import tensorflow as tf; print(\"Num GPUs Available: \", len(tf.config.list_physical_devices('GPU')))"
You can use the following code fields to show device name, type, memory and locality.
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

How can I run theano on GPU

If I run the following code with python 3.5
import numpy as np
import time
import theano
A = np.random.rand(1000,10000).astype(theano.config.floatX)
B = np.random.rand(10000,1000).astype(theano.config.floatX)
np_start = time.time()
AB = A.dot(B)
np_end = time.time()
X,Y = theano.tensor.matrices('XY')
mf = theano.function([X,Y],X.dot(Y))
t_start = time.time()
tAB = mf(A,B)
t_end = time.time()
print ("NP time: %f[s], theano time: %f[s] **(times should be close when run
on CPU!)**" %(np_end-np_start, t_end-t_start))
print ("Result difference: %f" % (np.abs(AB-tAB).max(), ))
I get the output
NP time: 0.161123[s], theano time: 0.167119[s] (times should be close when
run on CPU!)
Result difference: 0.000000
it says if the times are close, it means that I am running on my CPU.
How can I run this code on my GPU?
NOTE:
I have a workstation with Nvidia Quadro k4200.
I have installed Cuda toolkit
I have successfully worked an cuda vectorAdd sample project on VS2012.
You configure Theano to use a GPU by specifying the device=gpu in Theano's config. There are two principle methods for setting the config: (1) in the THEANO_FLAGS environment variable, or (2) via the .theanorc file. Both methods, and all of Theano's configuration flags, are documented.
You will know that Theano is using the GPU if, after calling import theano you see a message that looks something like this
Using gpu device 0: GeForce GT 640 (CNMeM is disabled)
The details may vary for you but if no message appears at all then Theano is using the CPU only.
Note also that even if you see the GPU message, your particular computation graph may not run on the GPU. To see which parts of your computation are running on the GPU print its compiled and optimized graph
f = theano.function(...)
theano.printing.debugprint(f)
Operations that start with the prefix 'Gpu' will run on the GPU. Operations that do not have that prefix to their name will run on the CPU.
If you are on Linux, create a .theanorc file in your home folder and add the following to set up theano to run on GPU.
[global]
device = gpu
floatx = float32
Alternatively, if you want to use the GPU programattically:
import theano.sandbox.cuda
theano.sandbox.cuda.use("gpu0")
You should see a message like this:
Using gpu device 0: Tesla K80
Useful if the environment you are running in isn't easy to configure.

Categories

Resources