Recently I was reproducing the model of the D3ST paper, so I needed to load the checkpoint. Due to my own programming habits, I prefer to use pytorch for programming. When loading the checkpoint, it keeps prompting me that it is not running on the TPU, so it cannot be loaded.
The Wrong infomation like this:
No OpKernel was registered to support Op 'TPUReplicatedInput' used by {{node input0}} with these attrs: [N=32, is_packed=false, is_mirrored_variable=false, index=0, T=DT_INT32]
Registered devices: [CPU]
Registered kernels:
<no registered kernels>
the checkpoints address at:https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md
I tried to run this program at Colab TPU, but I don't know where the probelem is so the program keeps telling me that my program is running at CPU. I'm so confused.
Here is the original code:
with tf.device('/TPU:0'):
with tf.compat.v1.Session() as sess:
saver = tf.compat.v1.train.import_meta_graph(os.path.join(path, meta_path))
saver.restore(sess, os.path.join(path, checkpoint_path))
saver.save(sess, "model.ckpt")
Related
# set the computation device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# Load model checkpoint
checkpoint = 'checkpoints/checkpoint_ssd300.pth.tar'
checkpoint = torch.load(checkpoint)
start_epoch = checkpoint['epoch'] + 1
print('\nLoaded checkpoint from epoch %d.\n' % start_epoch)
model = checkpoint['model']
model = model.to(device)
model.eval()
When I try to run this code block, I get the following problem:
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
The error message is indicating that you are trying to load a model checkpoint that was trained on a GPU (CUDA device), but your current machine does not have a GPU or CUDA is not available.
The line device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') is trying to determine whether CUDA is available on the current machine, and if it is, it sets the device variable to 'cuda', otherwise it sets it to 'cpu'.
The line checkpoint = torch.load(checkpoint) is trying to load the model checkpoint from the specified file, but it is trying to do so on the 'cuda' device, which is causing the error.
To resolve this issue, you can use the map_location argument of the torch.load function to specify that the model should be loaded on the 'cpu' device, instead of the 'cuda' device.
checkpoint = torch.load(checkpoint, map_location=torch.device('cpu'))
This way the model will be loaded on the CPU device, even if a CUDA device was used to train it.
I'm using kaggle TPU to train a tensorflow CycleGAN model. Everything is fine after training starts, but training freezes randomly after a few models. RAM has not exploded during training according to kaggle.
I've met with warnings during training as such:
2022-11-28 07:22:58.323282: W ./tensorflow/core/distributed_runtime/eager/destroy_tensor_handle_node.h:57] Ignoring an error encountered when deleting remote tensors handles: Invalid argument: Unable to find the relevant tensor remote_handle: Op ID: 89987, Output num: 0
Additional GRPC error information from remote target /job:worker/replica:0/task:0:
:{"created":"#1669620178.323159560","description":"Error received from peer ipv4:10.0.0.2:8470","file":"external/com_github_grpc_grpc/src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"Unable to find the relevant tensor remote_handle: Op ID: 89987, Output num: 0","grpc_status":3}
Epoch 5/200
When I'm configuring the TPUs I've warnings as:
2022-11-28 13:56:35.038036: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-11-28 13:56:35.040789: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/conda/lib
2022-11-28 13:56:35.040821: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303)
2022-11-28 13:56:35.040850: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (06e37d3ac4e4): /proc/driver/nvidia/version does not exist
2022-11-28 13:56:35.043518: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-11-28 13:56:35.044759: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-11-28 13:56:35.079672: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:301] Initialize GrpcChannelCache for job worker -> {0 -> 10.0.0.2:8470}
2022-11-28 13:56:35.079743: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:301] Initialize GrpcChannelCache for job localhost -> {0 -> localhost:30020}
2022-11-28 13:56:35.098707: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:301] Initialize GrpcChannelCache for job worker -> {0 -> 10.0.0.2:8470}
2022-11-28 13:56:35.098760: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:301] Initialize GrpcChannelCache for job localhost -> {0 -> localhost:30020}
2022-11-28 13:56:35.101231: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:411] Started server with target: grpc://localhost:30020
Tensorflow version is 2.4.1, other configs I haven't touched. My model.fit function looks like such:
history = gan_model.fit(gan_ds,
epochs=EPOCHS,
callbacks=[GANMonitor()],
steps_per_epoch=(max(n_monet_samples, n_photo_samples)//BATCH_SIZE),
verbose=2,
workers=0).history
Most parts of the code comes from a kaggle tutorial, but I've changed the model architecture. Is there a way to solve this issue?
🙏
I've tried configuring it to verbose=1 and saw that training freezes on a random step in the middle of an epoch. The number of epochs I'm able to go through seems to be depending on the model architecture and batchsize, so I think there's some issue with memory?
I tried to run below two tutorials on v3-8 and I encountered similar warnings in both the runs.
https://www.kaggle.com/code/philculliton/a-simple-petals-tf-2-2-notebook
https://www.kaggle.com/code/amyjang/monet-cyclegan-tutorial
But they didn't break the training.
Could you please check if the original tutorial code runs for a significant number of epochs? If yes, you might need to review your changes to the model architecture.
Also, if batch_size is affecting the number of training epochs, then most probably it's an Out of Memory error. Try reducing the batch_size preferably to a factor of 128 per core and see if the run completes.
More resources -
How improper batch_size can lead to OOM - https://cloud.google.com/tpu/docs/performance-guide#xla-efficiencies
Profiling guide - https://cloud.google.com/tpu/docs/cloud-tpu-tools
Feel free to explore our in-depth guides on TPUs with excellent tutorials - https://cloud.google.com/tpu/docs/intro-to-tpu
I'm trying to use the Fashionpedia model checkpoints for their clothing attribute detection models. I'm using tensorflow 2.10.0 and I'm only familiar with using sequential models in tensorflow and have never used a tf Session. As I understand the only way I can restore the model in this format is by using a session and importing the meta graph. I have the following code:
tf.compat.v1.disable_eager_execution()
model_path = r'C:\Users\tbrad\AppData\Local\Programs\Python\Python37\Fashionpedia\fashionpedia_model_checkpoints\fashionpedia-r101-fpn\\'
sess = tf.compat.v1.Session()
saver = tf.compat.v1.train.import_meta_graph(os.path.join(model_path,'model.ckpt.meta'))
saver.restore(sess,os.path.join(model_path,'model.ckpt'))
And it gives the following error: Detected at node 'input0' defined at (most recent call last): Node: 'input0' No OpKernel was registered to support Op 'TPUReplicatedInput' used by {{node input0}} with these attrs: [is_mirrored_variable=false, index=0, T=DT_INT32, N=32, is_packed=false] Registered devices: [CPU, GPU] Registered kernels: <no registered kernels>
I have very little understanding of TPUs. Could anyone explain what is actually causing this error? Also is there anyway to extract the information from these files to create a tensorflow model object to use instead of having to import the meta graph and use a session?
I am trying to parallelize a model with embedding layer, on tensorflow version 2.4.1 . But it is throwing me the following error :
InvalidArgumentError: Cannot assign a device for operation sequential/emb_layer/embedding_lookup/ReadVariableOp: Could not satisfy explicit device specification '' because the node {{colocation_node sequential/emb_layer/embedding_lookup/ReadVariableOp}} was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:0, /job:localhost/replica:0/task:0/device:GPU:0].
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=2 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
GatherV2: GPU CPU XLA_CPU XLA_GPU
Cast: GPU CPU XLA_CPU XLA_GPU
Const: GPU CPU XLA_CPU XLA_GPU
ResourceSparseApplyAdagradV2: CPU
_Arg: GPU CPU XLA_CPU XLA_GPU
ReadVariableOp: GPU CPU XLA_CPU XLA_GPU
Colocation members, user-requested devices, and framework assigned devices, if any:
sequential_emb_layer_embedding_lookup_readvariableop_resource (_Arg) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
adagrad_adagrad_update_update_0_resourcesparseapplyadagradv2_accum (_Arg) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
sequential/emb_layer/embedding_lookup/ReadVariableOp (ReadVariableOp)
sequential/emb_layer/embedding_lookup/axis (Const)
sequential/emb_layer/embedding_lookup (GatherV2)
gradient_tape/sequential/emb_layer/embedding_lookup/Shape (Const)
gradient_tape/sequential/emb_layer/embedding_lookup/Cast (Cast)
Adagrad/Adagrad/update/update_0/ResourceSparseApplyAdagradV2 (ResourceSparseApplyAdagradV2) /job:localhost/replica:0/task:0/device:GPU:0
[[{{node sequential/emb_layer/embedding_lookup/ReadVariableOp}}]] [Op:__inference_train_function_631]
Simplified the model to a basic model to make it reproducible :
import tensorflow as tf
central_storage_strategy = tf.distribute.MirroredStrategy()
with central_storage_strategy.scope():
user_model = tf.keras.Sequential([
tf.keras.layers.Embedding(10, 2, name = "emb_layer")
])
user_model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1), loss="mse")
user_model.fit([1],[[1,2]], epochs=3)
Any help will be highly appreciated. Thanks !
So finally I figured out the problem, if anyone is looking for an answer.
Tensorflow does not have complete GPU implementation of Adagrad optimizer as of now. ResourceSparseApplyAdagradV2 operation gives error on GPU, which is integral to embedding layer. So it can not be used with embedding layer with data parallelism strategies. Using Adam or rmsprop works fine.
I'm a noob when it comes to Python and machine learning. I'm trying to run two different projects that have to do with something called Deep Image Matting:
https://github.com/Joker316701882/Deep-Image-Matting with Tensorflow
https://github.com/huochaitiantang/pytorch-deep-image-matting with Pytorch
I'm just trying to run the tests in these projects but I run into various problems. Can I run these on a machine without GPU? I thought that GPU is only for speeding up processing, but I'm only interested in seeing these run before getting a machine with GPU.
I apologize in advance, as I know I'm a total noob in this
When I try the Tensorflow project:
I get an error with this line gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction = args.gpu_fraction) probably because I was tf2 and this requires tf1
After I downgraded to tf1 when I try to run the test I get W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
and InvalidArgumentError (see above for traceback): No OpKernel was registered to support Op 'MaxPoolWithArgmax' with these attrs. Registered devices: [CPU], Registered kernels:
<no registered kernels> and now I'm stuck because I have no clue what this means
When I try the Pytorch project:
First I get this error: RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
So I added map_location=torch.device('cpu') when the model is loaded, but now I get RuntimeError: Error(s) in loading state_dict for VGG16:
size mismatch for conv6_1.weight: copying a param with shape torch.Size([512, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]). And I'm stuck again
Can someone help?
thank you in advance!
For the PyTorch one, there were two problems and it looks like you've solved the first one on your own with map_location. The second problem is that the weights in your checkpoint and the weights in your model don't have the same shape! A quick detour to the github repo; let's visit net.py in core. Take a look at lines 26 to 28:
# model released before 2019.09.09 should use kernel_size=1 & padding=0
# self.conv6_1 = nn.Conv2d(512, 512, kernel_size=1, padding=0,bias=True)
self.conv6_1 = nn.Conv2d(512, 512, kernel_size=3, padding=1,bias=True)
I'm guessing the checkpoint is loading weights where conv6_1 has a kernel size of 1 rather than 3, like the commented out line of code. So try uncommenting the line with kernel_size=1 and comment out the line with kernel_size=3.