PyTorch | loss.backward() -> Missing XLA configuration - python

The loss is calculated from the target model created using pytorch (not TensorFlow) and when propagating, I run the code below and had trouble with the following error message.
loss.backward()
(Forward propagation can be calculated without problems.)
terminate called after throwing an instance of 'std::runtime_error'
what(): tensorflow/compiler/xla/xla_client/computation_client.cc:280 : Missing XLA configuration
Aborted
-pytorch(1.12.0+cu102)
torchvision(0.13.0+cu102) <- target model contains pre-trained CNN model which can be installed from torchvision.models
google-compute-engine
GPU (NVIDIA Tesla T4 x 1, 11.6) <- The code worked in the environment where GPU (11.2) was installed, but it does not work in the current environment. / In the current environment, the same error occurs even if the GPU is not used and the CPU is used.
TPU is not installed (I don't want to use TPU, but GPU)
The code is working locally and was also working on other GPU environments as mentioned above. It stopped working when the environment was updated.
Please help me···

I solved this problem with the command.
$ pip uninstall torch_xla
This error seemed to be caused by pytorch-ignite and torch_xla.

Related

How to disable TensorFlow GPU?

I first created my TensorFlow code in python on my GPU using :
import tensorflow-gpu as tf
I used it for training purpose and everything went very well. But now, I want to deploy my python scripts on a device without GPU. So, I uninstalled tensorflow-gpu with pip and import normal TensorFlow :
import tensorflow as tf
But when I run the script, it is still using the gpu :
I tried this code :
try:
# Disable all GPUS
tf.config.set_visible_devices([], 'GPU')
visible_devices = tf.config.get_visible_devices()
print(visible_devices)
for device in visible_devices:
assert device.device_type != 'GPU'
except Exception as e:
# Invalid device or cannot modify virtual devices once initialized.
print(e)
But still not working (even if the gpu seems disable as you can see in white on the screenshot).
I just want to return to the default TensorFlow installation without GPU features.
I tried to uninstall and install tensorflow, remove the virtual environment and create a new one, nothing worked.
Thanks for your help !
Tensorflow > 2.x has default GPU support. To know more please visit Tensorflow site.
As per the above screenshot, it is showing only CPU.
And also observe Adding visible GPU devices: 0
If you still want to use only CPU enable Tensorflow use Tensorflow==1.15

Converted ONNX model runs on CPU but not on GPU

I converted a TensorFlow Model to ONNX using this command:
python -m tf2onnx.convert --saved-model tensorflow-model-path --opset 10 --output model.onnx
The conversion was successful and I can inference on the CPU after installing onnxruntime.
But when I create a new environment, install onnxruntime-gpu on it and inference using GPU, I get different error messages based on the model. E.g. for MobileNet I receive W:onnxruntime:Default, cuda_execution_provider.cc:1498 GetCapability] CUDA kernel not supported. Fallback to CPU execution provider for Op type: Conv node name: StatefulPartitionedCall/mobilenetv2_1.00_224/Conv1/Conv2D
I tried out different opsets.
Does someone know why I am getting errors when running on GPU
That is not an error. That is a warning and it is basically telling you that that particular Conv node will run on CPU (instead of GPU). It is most likely because the GPU backend does not yet support asymmetric paddings and there is a PR in progress to mitigate this issue - https://github.com/microsoft/onnxruntime/pull/4627. Once this PR is merged, these warnings should go away and such Conv nodes will run on the GPU backend.

Is it possible to run tensorflow-gpu on a computer without a GPU or CUDA?

I have two Windows computers, one with and one without a GPU.
I would like to deploy the same python script on both (TensorFlow 1.8 Object Detection), without changing the packaged version of TensorFlow. In other words, I want to run tensorflow-gpu on a CPU.
In the event where my script cannot detect nvcuda.dll, I've tried using a Session config to disable the GPU, like so:
config = tf.ConfigProto(
device_count = {'GPU': 0}
)
and:
with tf.Session(graph=detection_graph, config=config) as sess:
However, this is not sufficient, as TensorFlow still returns the error:
ImportError: Could not find 'nvcuda.dll'. TensorFlow requires that this DLL be installed in a directory that is named in your %PATH% environment variable.
Typically it is installed in 'C:\Windows\System32'. If it is not present, ensure that you have a CUDA-capable GPU with the correct driver installed.
Is there any way to disable checking for a GPU/CUDA entirely and default to CPU?
EDIT: I have read the year-old answer regarding tensorflow-gpu==1.0 on Linux posted here, which suggests this is impossible. I'm interested to know if this is still how tensorflow-gpu is compiled, 9 versions later.

Run Tensorflow on Jupyter notebook but kernel dead

I want to train a 5 Layer DNN using Tensorflow on Jupyter Notebook. It perform well on normal training.
But when I want to use Cross validation to find a great dropout rate. When training process, Jupyter say the kernel is dead.
The Jupyter log:
terminate called after throwing an instance of 'std::system_error'
what(): Resource temporarily unavailable
My code is here.
I Google find out maybe it's because run out of memory. I try to reduce batch size and the error still occurred.
The code running on Ubuntu 16.04 and 32GB RAM with GPU 1080Ti. Enviroment are Python(3.5), tensorflow (1.3.0) & tensorflow-gpu (1.3.0).

specificy gpu devices failed when using tensorflow c++ api

I trained my tf model in python:
with sv.managed_session(master='') as sess:
with tf.device("/gpu:1"):#my systerm has 4 nvidia cards
and use the command line to abstract the model:
freeze_graph.py --clear_devices False
and during test phase, I set the device as follow:
tensorflow::graph::SetDefaultDevice("/gpu:1", &tensorflow_graph);
but someting is wrong:
ould not create Tensorflow Graph:
Invalid argument: Cannot assign a device to node '.../RNN_backword/while/Enter':
Could not satisfy explicit device specification '/gpu:1'
because no devices matching that specification are registered in this process;
available devices: /job:localhost/replica:0/task:0/cpu:0
so,how can I use gpu i correctly??
anyone could help??
Is it possible you're using a version of TensorFlow without GPU support enabled? If you're building a binary you may need to add additional BUILD rules from //tensorflow that enable GPU support. Also ensure you enabled GPU support when running configure.
EDIT: Can you file a bug on TF's github issues with:
1) your BUILD rule
2) much more of your code so we can see how you're building your model and creating your session
3) how you ran configure
While this API is not yet marked "public"; we want to see if there's indeed a bug you are running into so we can fix it.

Categories

Resources