Cannot assign a device to node - python

I followed this tutoriel to export my own trained tensorflow model to c++ and I got errors when I call freeze_graph
I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: TITAN X (Pascal), pci bus id: 0000:03:00.0)
...
tensorflow.python.framework.errors.InvalidArgumentError: Cannot assign a device to node 'save/Const_1': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
Colocation Debug Info:
Colocation group had the following types and devices:
Identity: CPU
Const: CPU
[[Node: save/Const_1 = Const[dtype=DT_STRING, value=Tensor<type: string shape: [] values: model>, _device="/device:GPU:0"]()]]
Caused by op u'save/Const_1', defined at:
...
GPU:0 is detected and usable by Tensorflow, so I don't understand from where the error comes from.
Any idea ?

The error means op save/Const_1 is trying to get placed on GPU, and there's no GPU implementation of that node. In fact Const nodes are CPU only and are stored as part of Graph object, so it can't be placed on GPU. One work-around is to run with allow_soft_placement=True, or to open the pbtxt file and manually remove the device line for that node

Related

How to set all tensors to cuda device?

The project needed to calculate on the GPU, but manually switching each tensor .to(device) is too long.
I used this, but the tensors still remain on the cpu. pic with problem
if torch.cuda.is_available():
torch.set_default_tensor_type(torch.cuda.FloatTensor)
To set all tensors to a CUDA device, you can use the 'to' method of the 'torch' tensor library. The to method allows you to specify the device that you want to move the tensor 'to'. For example, to move all tensors to the first CUDA device, you can use the following code:
import torch
# Set all tensors to the first CUDA device
device = torch.device("cuda:0")
torch.set_default_tensor_type(device)
Alternatively, you can also specify the device when you create a new tensor using the 'device' argument. For example:
import torch
# Set all tensors to the first CUDA device
device = torch.device("cuda:0")
x = torch.zeros(10, device=device)
This will create a tensor 'x' on the first CUDA device.

tensorflow training freezes randomly on kaggle tpu

I'm using kaggle TPU to train a tensorflow CycleGAN model. Everything is fine after training starts, but training freezes randomly after a few models. RAM has not exploded during training according to kaggle.
I've met with warnings during training as such:
2022-11-28 07:22:58.323282: W ./tensorflow/core/distributed_runtime/eager/destroy_tensor_handle_node.h:57] Ignoring an error encountered when deleting remote tensors handles: Invalid argument: Unable to find the relevant tensor remote_handle: Op ID: 89987, Output num: 0
Additional GRPC error information from remote target /job:worker/replica:0/task:0:
:{"created":"#1669620178.323159560","description":"Error received from peer ipv4:10.0.0.2:8470","file":"external/com_github_grpc_grpc/src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"Unable to find the relevant tensor remote_handle: Op ID: 89987, Output num: 0","grpc_status":3}
Epoch 5/200
When I'm configuring the TPUs I've warnings as:
2022-11-28 13:56:35.038036: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-11-28 13:56:35.040789: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/conda/lib
2022-11-28 13:56:35.040821: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303)
2022-11-28 13:56:35.040850: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (06e37d3ac4e4): /proc/driver/nvidia/version does not exist
2022-11-28 13:56:35.043518: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-11-28 13:56:35.044759: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-11-28 13:56:35.079672: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:301] Initialize GrpcChannelCache for job worker -> {0 -> 10.0.0.2:8470}
2022-11-28 13:56:35.079743: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:301] Initialize GrpcChannelCache for job localhost -> {0 -> localhost:30020}
2022-11-28 13:56:35.098707: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:301] Initialize GrpcChannelCache for job worker -> {0 -> 10.0.0.2:8470}
2022-11-28 13:56:35.098760: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:301] Initialize GrpcChannelCache for job localhost -> {0 -> localhost:30020}
2022-11-28 13:56:35.101231: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:411] Started server with target: grpc://localhost:30020
Tensorflow version is 2.4.1, other configs I haven't touched. My model.fit function looks like such:
history = gan_model.fit(gan_ds,
epochs=EPOCHS,
callbacks=[GANMonitor()],
steps_per_epoch=(max(n_monet_samples, n_photo_samples)//BATCH_SIZE),
verbose=2,
workers=0).history
Most parts of the code comes from a kaggle tutorial, but I've changed the model architecture. Is there a way to solve this issue?
🙏
I've tried configuring it to verbose=1 and saw that training freezes on a random step in the middle of an epoch. The number of epochs I'm able to go through seems to be depending on the model architecture and batchsize, so I think there's some issue with memory?
I tried to run below two tutorials on v3-8 and I encountered similar warnings in both the runs.
https://www.kaggle.com/code/philculliton/a-simple-petals-tf-2-2-notebook
https://www.kaggle.com/code/amyjang/monet-cyclegan-tutorial
But they didn't break the training.
Could you please check if the original tutorial code runs for a significant number of epochs? If yes, you might need to review your changes to the model architecture.
Also, if batch_size is affecting the number of training epochs, then most probably it's an Out of Memory error. Try reducing the batch_size preferably to a factor of 128 per core and see if the run completes.
More resources -
How improper batch_size can lead to OOM - https://cloud.google.com/tpu/docs/performance-guide#xla-efficiencies
Profiling guide - https://cloud.google.com/tpu/docs/cloud-tpu-tools
Feel free to explore our in-depth guides on TPUs with excellent tutorials - https://cloud.google.com/tpu/docs/intro-to-tpu

Not able to use Embedding Layer with tf.distribute.MirroredStrategy

I am trying to parallelize a model with embedding layer, on tensorflow version 2.4.1 . But it is throwing me the following error :
InvalidArgumentError: Cannot assign a device for operation sequential/emb_layer/embedding_lookup/ReadVariableOp: Could not satisfy explicit device specification '' because the node {{colocation_node sequential/emb_layer/embedding_lookup/ReadVariableOp}} was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:0, /job:localhost/replica:0/task:0/device:GPU:0].
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=2 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
GatherV2: GPU CPU XLA_CPU XLA_GPU
Cast: GPU CPU XLA_CPU XLA_GPU
Const: GPU CPU XLA_CPU XLA_GPU
ResourceSparseApplyAdagradV2: CPU
_Arg: GPU CPU XLA_CPU XLA_GPU
ReadVariableOp: GPU CPU XLA_CPU XLA_GPU
Colocation members, user-requested devices, and framework assigned devices, if any:
sequential_emb_layer_embedding_lookup_readvariableop_resource (_Arg) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
adagrad_adagrad_update_update_0_resourcesparseapplyadagradv2_accum (_Arg) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
sequential/emb_layer/embedding_lookup/ReadVariableOp (ReadVariableOp)
sequential/emb_layer/embedding_lookup/axis (Const)
sequential/emb_layer/embedding_lookup (GatherV2)
gradient_tape/sequential/emb_layer/embedding_lookup/Shape (Const)
gradient_tape/sequential/emb_layer/embedding_lookup/Cast (Cast)
Adagrad/Adagrad/update/update_0/ResourceSparseApplyAdagradV2 (ResourceSparseApplyAdagradV2) /job:localhost/replica:0/task:0/device:GPU:0
[[{{node sequential/emb_layer/embedding_lookup/ReadVariableOp}}]] [Op:__inference_train_function_631]
Simplified the model to a basic model to make it reproducible :
import tensorflow as tf
central_storage_strategy = tf.distribute.MirroredStrategy()
with central_storage_strategy.scope():
user_model = tf.keras.Sequential([
tf.keras.layers.Embedding(10, 2, name = "emb_layer")
])
user_model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1), loss="mse")
user_model.fit([1],[[1,2]], epochs=3)
Any help will be highly appreciated. Thanks !
So finally I figured out the problem, if anyone is looking for an answer.
Tensorflow does not have complete GPU implementation of Adagrad optimizer as of now. ResourceSparseApplyAdagradV2 operation gives error on GPU, which is integral to embedding layer. So it can not be used with embedding layer with data parallelism strategies. Using Adam or rmsprop works fine.

Training a model using tensorflow on a GPU, using Adadelta optimizer doesn't work. But when i replace Adadelta with Adam it seems to have no issues.

I'm trying to train a model on tensorflow(v1.9.0 on python2) with adadelta optimizer on a GPU. It shows the following error.
InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'embedding_matrix_de/read': Could not satisfy explicit device specification '' because the node was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'
Colocation Debug Info:
Colocation group had the following types and devices:
UnsortedSegmentSum: GPU CPU
Unique: GPU CPU
Shape: GPU CPU
Cast: GPU CPU
StridedSlice: GPU CPU
GatherV2: GPU CPU
SparseApplyAdadelta: CPU
Const: GPU CPU
Identity: CPU
VariableV2: GPU CPU
Colocation members and user-requested devices:
embedding_matrix_de (VariableV2)
embedding_matrix_de/read (Identity)
embedding_lookup/axis (Const)
embedding_lookup (GatherV2)
gradients/embedding_lookup_grad/Shape (Const)
gradients/embedding_lookup_grad/ToInt32 (Cast)
embedding_matrix_de/Adadelta (VariableV2)
embedding_matrix_de/Adadelta_1 (VariableV2)
Adadelta/update_embedding_matrix_de/Unique (Unique)
Adadelta/update_embedding_matrix_de/Shape (Shape)
Adadelta/update_embedding_matrix_de/strided_slice/stack (Const)
Adadelta/update_embedding_matrix_de/strided_slice/stack_1 (Const)
Adadelta/update_embedding_matrix_de/strided_slice/stack_2 (Const)
Adadelta/update_embedding_matrix_de/strided_slice (StridedSlice)
Adadelta/update_embedding_matrix_de/UnsortedSegmentSum (UnsortedSegmentSum)
Adadelta/update_embedding_matrix_de/SparseApplyAdadelta (SparseApplyAdadelta)
[[Node: embedding_matrix_de/read = Identity[T=DT_FLOAT, _class=["loc:#embedding_matrix_de"]](embedding_matrix_de)]]
And when i replace adadelta with adam, there are no issues. Some pieces of code are given below.
....
embedding_matrix_decode = tf.get_variable(
name="embedding_matrix_de",
shape=[trainVocabSize, embedding_size],
dtype=tf.float32)
....
optimizer = tf.train.AdadeltaOptimizer()
....
I encountered the same issue with Tensorflow 2.1.1. Adadelta optimizer seems to have no support on GPU nor TPU.

Keras seems to hang after call to fit_generator

I am trying to fit the Keras implementation of the SqueezeDet model to a new dataset. After making the appropriate changes to my config file, I tried to run the train script, but it seems to hang after the call to fit_generator(). As I get the following output:
/anaconda/envs/py35/lib/python3.5/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
Using TensorFlow backend.
Number of images: 536
Number of epochs: 100
Number of batches: 53
Batch size: 10
2018-07-04 14:18:49.711606: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-07-04 14:18:54.080912: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 52a9:00:00.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
2018-07-04 14:18:54.080958: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-07-04 14:18:54.333214: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-07-04 14:18:54.333270: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2018-07-04 14:18:54.333290: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2018-07-04 14:18:54.333559: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10764 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 52a9:00:00.0, compute capability: 3.7)
Learning rate: 0.01
Weights initialized by name from ../main/model/imagenet.h5
Using single GPU
Backend Qt5Agg is interactive backend. Turning interactive mode on.
Epoch 1/100
And then nothing happens even if it leave it alone for a day. The call that it seems to freeze on is:
squeeze.model.fit_generator(train_generator, epochs=EPOCHS, verbose=1,
steps_per_epoch=nbatches_train, callbacks=cb)
Where the parameters are:
train_generator = generator_from_data_path(img_names, gt_names, config=cfg)
EPOCHS = 100
nbatches_train = 53
callbacks = [# TensorBoard object, ReduceLROnPlateau object, ModelCheckpoint object #]
My versions:
Python 3.5.4 :: Anaconda custom (64-bit)
tensorflow-gpu : 1.8.0
tensorflow : 1.8.0
Keras : 2.2.0
Formatting conversation in comments to answer.
The culprit was train_generator.
I have looked into sources of model.fit_generator in Keras some time ago. It just retrieves some data from the generator and submits it to the backend, nothing magical :)
So, my hypothesis was that it cannot retrieve data from the generator because the generator does not generate anything.
#Barker has confirmed it, stating that call to next(train_generator) hangs.
I personally have moved to keras.utils.Sequence that supports indexing and length and is much more convenient than ordinary generators. Though this note is not related to the current problem.

Categories

Resources