I want to train a 5 Layer DNN using Tensorflow on Jupyter Notebook. It perform well on normal training.
But when I want to use Cross validation to find a great dropout rate. When training process, Jupyter say the kernel is dead.
The Jupyter log:
terminate called after throwing an instance of 'std::system_error'
what(): Resource temporarily unavailable
My code is here.
I Google find out maybe it's because run out of memory. I try to reduce batch size and the error still occurred.
The code running on Ubuntu 16.04 and 32GB RAM with GPU 1080Ti. Enviroment are Python(3.5), tensorflow (1.3.0) & tensorflow-gpu (1.3.0).
Related
The loss is calculated from the target model created using pytorch (not TensorFlow) and when propagating, I run the code below and had trouble with the following error message.
loss.backward()
(Forward propagation can be calculated without problems.)
terminate called after throwing an instance of 'std::runtime_error'
what(): tensorflow/compiler/xla/xla_client/computation_client.cc:280 : Missing XLA configuration
Aborted
-pytorch(1.12.0+cu102)
torchvision(0.13.0+cu102) <- target model contains pre-trained CNN model which can be installed from torchvision.models
google-compute-engine
GPU (NVIDIA Tesla T4 x 1, 11.6) <- The code worked in the environment where GPU (11.2) was installed, but it does not work in the current environment. / In the current environment, the same error occurs even if the GPU is not used and the CPU is used.
TPU is not installed (I don't want to use TPU, but GPU)
The code is working locally and was also working on other GPU environments as mentioned above. It stopped working when the environment was updated.
Please help me···
I solved this problem with the command.
$ pip uninstall torch_xla
This error seemed to be caused by pytorch-ignite and torch_xla.
'I am a window10 user
I'm running tensorflow in jupyter notebook, perceptron works fine, but cnn kernel crashes.
I also tried increasing the buffer size, but the kernel keeps crashing.
How do I fix it?'
I converted a TensorFlow Model to ONNX using this command:
python -m tf2onnx.convert --saved-model tensorflow-model-path --opset 10 --output model.onnx
The conversion was successful and I can inference on the CPU after installing onnxruntime.
But when I create a new environment, install onnxruntime-gpu on it and inference using GPU, I get different error messages based on the model. E.g. for MobileNet I receive W:onnxruntime:Default, cuda_execution_provider.cc:1498 GetCapability] CUDA kernel not supported. Fallback to CPU execution provider for Op type: Conv node name: StatefulPartitionedCall/mobilenetv2_1.00_224/Conv1/Conv2D
I tried out different opsets.
Does someone know why I am getting errors when running on GPU
That is not an error. That is a warning and it is basically telling you that that particular Conv node will run on CPU (instead of GPU). It is most likely because the GPU backend does not yet support asymmetric paddings and there is a PR in progress to mitigate this issue - https://github.com/microsoft/onnxruntime/pull/4627. Once this PR is merged, these warnings should go away and such Conv nodes will run on the GPU backend.
I have installed Keras with gpu support in R based on Tensorflow with gpu support. This is installed with these steps:
https://towardsdatascience.com/installing-tensorflow-with-cuda-cudnn-and-gpu-support-on-windows-10-60693e46e781
If I run the Bosting housing example code from the book Deep learning with R, I receive this screen:
Can I conclude that the code runs on the GPU?
Or is this line from the picture above giving an error:
GPU libraries are statically linked, skip dlopen check.
During running the code the GPU is running only on 3% of capacity while the CPU is running on 20-25%.
The code is NOT running faster than while I initially did run the code without installing GPU support.
Thank you!
Yes, tensorflow is running with GPU enabled. Boston Housing is a relatively small dataset and probably does not benefit from using the GPU to a large degree. The lines below indicate it is running on the GPU. "Created tensorflow device (/job:localhost/replica:0/task:0device:GPU:0".
From the guide at Tensorflow
You can set tf.debugging.set_log_device_placement(True) in order to explicitly see where each operation is running. THE R equivalent is below.
library(tensorflow)
tf$debugging$set_log_device_placement(TRUE)
When I typically run a python script from command line, for example, python test.py, the GPU memory will be released just after the script finished.
In this test.py script, I simply loaded a keras built model to evaluate and predict some data. No training process in it.
However, if I open my 'spyder', and run this script in 'spyder', the results come in the 'ipython' section, but then I type nvidia-smi from command line, the GPU memory is not released.
So, what I tried is close this 'ipython' kernel and start a new one. But all my other variables will be lost. Is there a decent way to release the GPU memory after model.evaluate(x, y) from 'spyder'?
Here are some screen shots:
Before and after running the script from 'spyder':
Normally, tensorflow backend will reserve all the memory on the GPU. It may not really use all of the memory, but it will be kept occupied from being used by other programs until tensorflow backend is terminated. So in nvidia-smi you will see the memory is not release even tensorflow has released the previous memory in its framework.