I trained my tf model in python:
with sv.managed_session(master='') as sess:
with tf.device("/gpu:1"):#my systerm has 4 nvidia cards
and use the command line to abstract the model:
freeze_graph.py --clear_devices False
and during test phase, I set the device as follow:
tensorflow::graph::SetDefaultDevice("/gpu:1", &tensorflow_graph);
but someting is wrong:
ould not create Tensorflow Graph:
Invalid argument: Cannot assign a device to node '.../RNN_backword/while/Enter':
Could not satisfy explicit device specification '/gpu:1'
because no devices matching that specification are registered in this process;
available devices: /job:localhost/replica:0/task:0/cpu:0
so,how can I use gpu i correctly??
anyone could help??
Is it possible you're using a version of TensorFlow without GPU support enabled? If you're building a binary you may need to add additional BUILD rules from //tensorflow that enable GPU support. Also ensure you enabled GPU support when running configure.
EDIT: Can you file a bug on TF's github issues with:
1) your BUILD rule
2) much more of your code so we can see how you're building your model and creating your session
3) how you ran configure
While this API is not yet marked "public"; we want to see if there's indeed a bug you are running into so we can fix it.
Related
The loss is calculated from the target model created using pytorch (not TensorFlow) and when propagating, I run the code below and had trouble with the following error message.
loss.backward()
(Forward propagation can be calculated without problems.)
terminate called after throwing an instance of 'std::runtime_error'
what(): tensorflow/compiler/xla/xla_client/computation_client.cc:280 : Missing XLA configuration
Aborted
-pytorch(1.12.0+cu102)
torchvision(0.13.0+cu102) <- target model contains pre-trained CNN model which can be installed from torchvision.models
google-compute-engine
GPU (NVIDIA Tesla T4 x 1, 11.6) <- The code worked in the environment where GPU (11.2) was installed, but it does not work in the current environment. / In the current environment, the same error occurs even if the GPU is not used and the CPU is used.
TPU is not installed (I don't want to use TPU, but GPU)
The code is working locally and was also working on other GPU environments as mentioned above. It stopped working when the environment was updated.
Please help me···
I solved this problem with the command.
$ pip uninstall torch_xla
This error seemed to be caused by pytorch-ignite and torch_xla.
There is a annoying problem when using a libtorch function in pytorch(python).
The device setting in libtorch is k:CUDA, and its default meaning is CUDA:0.
But when I use it in python pytorch, I want to use other CUDA device, not CUDA:0.
I have changed CUDA_VISIBLE_DEVICES in python, but this can't effect the libtorch device settings.
Is there any solution for this condition?
Thanks~
I have a saved transformers model using BertModel.from_pretrained('test_model')
I have trained this model using google colab's GPUs
Then, I want to open it, with BertModel.from_pretrained('test_model/')
but I do not have a GPU in my local PC. I get this:
/home/seiji/.local/lib/python3.8/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at /pytorch/c10/cuda/CUDAFunctions.cpp:100.)
return torch._C._cuda_getDeviceCount() > 0
What shoud I do? I have no idea of how can I open it using a CPU. And is it possible?
The best thing you can do is save the CPU version of the model, i.e:
model.cpu().save_pretrained("model_directory")
All the pre-trained Huggingface models are saved as CPU models anyway and you always need to move them to GPU explicitly.
PyTorch allows loading GPU models on CPU (see https://discuss.pytorch.org/t/on-a-cpu-device-how-to-load-checkpoint-saved-on-gpu-device/349), but the arguments of torch.load you would need to set are not exposed via the API, so you would need write your own from_pretrained method.
I converted a TensorFlow Model to ONNX using this command:
python -m tf2onnx.convert --saved-model tensorflow-model-path --opset 10 --output model.onnx
The conversion was successful and I can inference on the CPU after installing onnxruntime.
But when I create a new environment, install onnxruntime-gpu on it and inference using GPU, I get different error messages based on the model. E.g. for MobileNet I receive W:onnxruntime:Default, cuda_execution_provider.cc:1498 GetCapability] CUDA kernel not supported. Fallback to CPU execution provider for Op type: Conv node name: StatefulPartitionedCall/mobilenetv2_1.00_224/Conv1/Conv2D
I tried out different opsets.
Does someone know why I am getting errors when running on GPU
That is not an error. That is a warning and it is basically telling you that that particular Conv node will run on CPU (instead of GPU). It is most likely because the GPU backend does not yet support asymmetric paddings and there is a PR in progress to mitigate this issue - https://github.com/microsoft/onnxruntime/pull/4627. Once this PR is merged, these warnings should go away and such Conv nodes will run on the GPU backend.
I have installed Keras with gpu support in R based on Tensorflow with gpu support. This is installed with these steps:
https://towardsdatascience.com/installing-tensorflow-with-cuda-cudnn-and-gpu-support-on-windows-10-60693e46e781
If I run the Bosting housing example code from the book Deep learning with R, I receive this screen:
Can I conclude that the code runs on the GPU?
Or is this line from the picture above giving an error:
GPU libraries are statically linked, skip dlopen check.
During running the code the GPU is running only on 3% of capacity while the CPU is running on 20-25%.
The code is NOT running faster than while I initially did run the code without installing GPU support.
Thank you!
Yes, tensorflow is running with GPU enabled. Boston Housing is a relatively small dataset and probably does not benefit from using the GPU to a large degree. The lines below indicate it is running on the GPU. "Created tensorflow device (/job:localhost/replica:0/task:0device:GPU:0".
From the guide at Tensorflow
You can set tf.debugging.set_log_device_placement(True) in order to explicitly see where each operation is running. THE R equivalent is below.
library(tensorflow)
tf$debugging$set_log_device_placement(TRUE)