RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED - python

I am doing training and put the dataset inside the data folder. The Structure looks like this.
--data
-----mars
---------bbox_train
---------bbox_test
---------info
Many developers said that this is a label problem but I am not sure because labels are in the right place.
Traceback (most recent call last):
Args:Namespace(arch='resnet50graphpoolparthyper', concat=False, dataset='mars', dropout=0.1, eval_step=100, evaluate=False, gamma=0.1, gpu_devices='0', height=256, htri_only=False, lr=0.0003, margin=0.3, max_epoch=800, nheads=8, nhid=512, num_instances=4, part1=4, part2=8, part3=2, pool='avg', pretrained_model='/home/jiyang/Workspace/Works/video-person-reid/3dconv-person-reid/pretrained_models/resnet-50-kinetics.pth', print_freq=80, save_dir='log_hypergraphsagepart', seed=1, seq_len=8, start_epoch=0, stepsize=200, test_batch=1, train_batch=32, use_cpu=False, warmup=True, weight_decay=0.0005, width=128, workers=4, xent_only=False)
==========
Currently using GPU 0
Initializing dataset mars
=> MARS loaded
Dataset statistics:
------------------------------
subset | # ids | # tracklets
------------------------------
train | 625 | 8298
query | 626 | 1980
gallery | 622 | 9330
------------------------------
total | 1251 | 19608
number of images per tracklet: 2 ~ 920, average 59.5
------------------------------
Initializing model: resnet50graphpoolparthyper
Model size: 44.17957M
==> Epoch 1/800 lr:1.785e-05
Traceback (most recent call last):
File "main_video_person_reid_hypergraphsage_part.py", line 357, in <module>
main()
File "main_video_person_reid_hypergraphsage_part.py", line 220, in main
train(model, criterion_xent, criterion_htri, optimizer, trainloader, use_gpu)
File "main_video_person_reid_hypergraphsage_part.py", line 257, in train
outputs, features = model(imgs)
File "/home/khawar/anaconda3/envs/hypergraph_reid/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/khawar/anaconda3/envs/hypergraph_reid/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 165, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/khawar/anaconda3/envs/hypergraph_reid/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/media/khawar/HDD_Khawar1/hypergraph_reid/models/ResNet_hypergraphsage_part.py", line 621, in forward
x = self.base(x)
File "/home/khawar/anaconda3/envs/hypergraph_reid/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/media/khawar/HDD_Khawar1/hypergraph_reid/models/resnet.py", line 213, in forward
x = self.conv1(x)
File "/home/khawar/anaconda3/envs/hypergraph_reid/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/khawar/anaconda3/envs/hypergraph_reid/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 399, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/home/khawar/anaconda3/envs/hypergraph_reid/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 396, in _conv_forward
self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

Installing torch with CUDA 11.1 with the following command did fix the initial issue with torch 1.8:
pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html

Related

TensorFlow Model is throwing an Invalid Argument Error at the 1st Epoch

I have a simple 2 layer Tensorflow model that I am trying to train on a dataset of equal-sized stereo audio files to tell me if the sound is coming more from the left side or the right side. This means the input is an array of 3072 by 2 arrays and the output is an array of 1's and 0's to represent left and right.
The problem is that when I run the program, it fails at model.fit() with an invalid argument error.
Code:
# -*- coding: utf-8 -*-
"""
Created on Tue Jan 18 15:51:56 2022
#author: andre
"""
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras import layers
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import ModelCheckpoint
from sklearn.model_selection import train_test_split
from datetime import datetime
from sklearn import metrics
from scipy.io import wavfile
import os
import glob
# Load in Right Side .WAV Data.
X1 = []
count1 = 0
database_path = "C:\\Users\\andre\\OneDrive\\Documents\\ESI2022\\MLDatabases\\Right\\"
for filename in glob.glob(os.path.join(database_path, '*.wav')):
X1.append(wavfile.read(filename)[1])
count1 = count1 + 1
# Load in Left side .WAV Data.
X2 = []
count2 = 0
database_path2 = "C:\\Users\\andre\\OneDrive\\Documents\\ESI2022\\MLDatabases\\Right\\"
for filename2 in glob.glob(os.path.join(database_path2, '*.wav')):
X2.append(wavfile.read(filename2)[1])
count2 = count2 + 1
# Get the smallest size audio file (this will be sample size input to model)
sample_size = len(X1[0])
for data in X1:
if len(data) < sample_size:
sample_size = len(data)
# Make audio data into equal size chunks
X1e = []
for i in X1:
num_chunks = len(i)//sample_size
for j in range(num_chunks):
X1e.append(i[(j+1)*sample_size-sample_size:(j+1)*sample_size])
X1 = X1e
X2e = []
for i in X2:
num_chunks = len(i)//sample_size
for j in range(num_chunks):
X2e.append(i[(j+1)*sample_size-sample_size:(j+1)*sample_size])
X2=X2e
del X1e
del X2e
# Create Output data that is the same length as the input data.
Y1 = np.ones([X1.__len__()],dtype='float32').tolist()
Y2 = np.zeros([X2.__len__()],dtype='float32').tolist()
# Concatenate Left and Right .WAV data and output data as numpy arrays.
X1.extend(X2)
X = np.asarray(X1)
Y = np.asarray(Y1+Y2).astype(np.int16)
#X=list(X)
#Y=list(Y)
# Split data into test training data.
X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=0.2,random_state=0,shuffle=True)
'''
print(X[1])
time = np.linspace(0.,33792, 33792)
plt.plot(time, X[1][:,1], label="Left channel")
plt.plot(time, X[1][:,0], label="Right channel")
plt.legend()
plt.xlabel("Time [s]")
plt.ylabel("Amplitude")
plt.show()
'''
# Create the Model
model = Sequential()
# Add a LSTM layer with 1 output, and ambiguous input data length.
model.add(layers.LSTM(1,batch_input_shape=(1,sample_size,2),return_sequences=True))
model.add(layers.LSTM(1,return_sequences=False))
# Compile Model
#history = model.compile(loss='mean_absolute_error', metrics=['accuracy'],optimizer='adam',output='sparse_categorical_crossentropy')
optimizer = Adam(learning_rate=2*1e-4)
'''
history = model.compile(optimizer=optimizer, loss={
'output': 'sparse_categorical_crossentropy', },
metrics={
'output': 'sparse_categorical_accuracy', },
sample_weight_mode='temporal')
'''
history = model.compile(
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
optimizer="adam",
metrics=["accuracy"],
)
model.summary()
# Define Training Parameters
num_epochs = 200
num_batch_size = 1
# Save the most accurate model to file. (Verbosity Gives more information)
checkpointer = ModelCheckpoint(filepath="SavedModels/checkpointModel.hdf5", verbose=1,save_best_only=True)
# Start the timer
start = datetime.now()
# Train the model
model.fit(X_train,Y_train,batch_size=num_batch_size, epochs=num_epochs, validation_data=(X_test,Y_test), callbacks=[checkpointer],verbose=1)
# Get and Print Model Validation Accuracy
test_accuracy=model.evaluate(X_test,Y_test,verbose=0)
print(test_accuracy[1])
Output & error:
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_2 (LSTM) (1, 3072, 1) 16
lstm_3 (LSTM) (1, 1) 12
=================================================================
Total params: 28
Trainable params: 28
Non-trainable params: 0
_________________________________________________________________
Epoch 1/200
2022-02-07 09:40:36.348127: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2022-02-07 09:40:36.348459: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-02-07 09:40:43.978976: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'nvcuda.dll'; dlerror: nvcuda.dll not found
2022-02-07 09:40:43.979029: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2022-02-07 09:40:43.985710: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: DESKTOP-0FFTIDB
2022-02-07 09:40:43.986092: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: DESKTOP-0FFTIDB
2022-02-07 09:40:43.990164: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-02-07 09:40:48.470415: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at sparse_xent_op.cc:103 : INVALID_ARGUMENT: Received a label value of 1 which is outside the valid range of [0, 1). Label values: 1
2022-02-07 09:58:29.070767: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at sparse_xent_op.cc:103 : INVALID_ARGUMENT: Received a label value of 1 which is outside the valid range of [0, 1). Label values: 1
Traceback (most recent call last):
File "C:\Users\andre\OneDrive\Documents\ESI2022\PythonScripts\BeltML\testML.py", line 127, in <module>
model.fit(X_train,Y_train,batch_size=num_batch_size, epochs=num_epochs, validation_data=(X_test,Y_test), callbacks=[checkpointer],verbose=1)
File "C:\ProgramData\Anaconda3\lib\site-packages\keras\utils\traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\eager\execute.py", line 58, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
InvalidArgumentError: Received a label value of 1 which is outside the valid range of [0, 1). Label values: 1
[[node sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits
(defined at C:\ProgramData\Anaconda3\lib\site-packages\keras\backend.py:5113)
]] [Op:__inference_train_function_9025]
Errors may have originated from an input operation.
Input Source operations connected to node sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits:
In[0] sparse_categorical_crossentropy/Reshape_1 (defined at C:\ProgramData\Anaconda3\lib\site-packages\keras\backend.py:5109)
In[1] sparse_categorical_crossentropy/Reshape (defined at C:\ProgramData\Anaconda3\lib\site-packages\keras\backend.py:3561)
Operation defined at: (most recent call last)
File "C:\ProgramData\Anaconda3\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\ProgramData\Anaconda3\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\ProgramData\Anaconda3\lib\site-packages\spyder_kernels\console\__main__.py", line 23, in <module>
start.main()
File "C:\ProgramData\Anaconda3\lib\site-packages\spyder_kernels\console\start.py", line 328, in main
kernel.start()
File "C:\ProgramData\Anaconda3\lib\site-packages\ipykernel\kernelapp.py", line 677, in start
self.io_loop.start()
File "C:\ProgramData\Anaconda3\lib\site-packages\tornado\platform\asyncio.py", line 199, in start
self.asyncio_loop.run_forever()
File "C:\ProgramData\Anaconda3\lib\asyncio\base_events.py", line 570, in run_forever
self._run_once()
File "C:\ProgramData\Anaconda3\lib\asyncio\base_events.py", line 1859, in _run_once
handle._run()
File "C:\ProgramData\Anaconda3\lib\asyncio\events.py", line 81, in _run
self._context.run(self._callback, *self._args)
File "C:\ProgramData\Anaconda3\lib\site-packages\ipykernel\kernelbase.py", line 457, in dispatch_queue
await self.process_one()
File "C:\ProgramData\Anaconda3\lib\site-packages\ipykernel\kernelbase.py", line 446, in process_one
await dispatch(*args)
File "C:\ProgramData\Anaconda3\lib\site-packages\ipykernel\kernelbase.py", line 353, in dispatch_shell
await result
File "C:\ProgramData\Anaconda3\lib\site-packages\ipykernel\kernelbase.py", line 648, in execute_request
reply_content = await reply_content
File "C:\ProgramData\Anaconda3\lib\site-packages\ipykernel\ipkernel.py", line 353, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File "C:\ProgramData\Anaconda3\lib\site-packages\ipykernel\zmqshell.py", line 533, in run_cell
return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2901, in run_cell
result = self._run_cell(
File "C:\ProgramData\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2947, in _run_cell
return runner(coro)
File "C:\ProgramData\Anaconda3\lib\site-packages\IPython\core\async_helpers.py", line 68, in _pseudo_sync_runner
coro.send(None)
File "C:\ProgramData\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 3172, in run_cell_async
has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
File "C:\ProgramData\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 3364, in run_ast_nodes
if (await self.run_code(code, result, async_=asy)):
File "C:\ProgramData\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 3444, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "C:\Users\andre\AppData\Local\Temp/ipykernel_3604/1229251547.py", line 1, in <module>
runfile('C:/Users/andre/OneDrive/Documents/ESI2022/PythonScripts/BeltML/testML.py', wdir='C:/Users/andre/OneDrive/Documents/ESI2022/PythonScripts/BeltML')
File "C:\ProgramData\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 577, in runfile
exec_code(file_code, filename, ns_globals, ns_locals,
File "C:\ProgramData\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 465, in exec_code
exec(compiled, ns_globals, ns_locals)
File "C:\Users\andre\OneDrive\Documents\ESI2022\PythonScripts\BeltML\testML.py", line 127, in <module>
model.fit(X_train,Y_train,batch_size=num_batch_size, epochs=num_epochs, validation_data=(X_test,Y_test), callbacks=[checkpointer],verbose=1)
File "C:\ProgramData\Anaconda3\lib\site-packages\keras\utils\traceback_utils.py", line 64, in error_handler
return fn(*args, **kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\keras\engine\training.py", line 1216, in fit
tmp_logs = self.train_function(iterator)
File "C:\ProgramData\Anaconda3\lib\site-packages\keras\engine\training.py", line 878, in train_function
return step_function(self, iterator)
File "C:\ProgramData\Anaconda3\lib\site-packages\keras\engine\training.py", line 867, in step_function
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "C:\ProgramData\Anaconda3\lib\site-packages\keras\engine\training.py", line 860, in run_step
outputs = model.train_step(data)
File "C:\ProgramData\Anaconda3\lib\site-packages\keras\engine\training.py", line 809, in train_step
loss = self.compiled_loss(
File "C:\ProgramData\Anaconda3\lib\site-packages\keras\engine\compile_utils.py", line 201, in __call__
loss_value = loss_obj(y_t, y_p, sample_weight=sw)
File "C:\ProgramData\Anaconda3\lib\site-packages\keras\losses.py", line 141, in __call__
losses = call_fn(y_true, y_pred)
File "C:\ProgramData\Anaconda3\lib\site-packages\keras\losses.py", line 245, in call
return ag_fn(y_true, y_pred, **self._fn_kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\keras\losses.py", line 1737, in sparse_categorical_crossentropy
return backend.sparse_categorical_crossentropy(
File "C:\ProgramData\Anaconda3\lib\site-packages\keras\backend.py", line 5113, in sparse_categorical_crossentropy
res = tf.nn.sparse_softmax_cross_entropy_with_logits(

According to the documentation, the argument labels must be a batch_size vector with values in [0, num_classes)
From your logs:
received label value of 1 which is outside the valid range of [0, 1)
Perhaps the framework thinks that you have only one class because I also see that your Neural Network also has just 1 output.
Maybe for applying that SparseSoftmaxCrossEntropyWithLogits loss function you need 2 outputs. And your labels must be either 0 or 1.

I am getting this error of Float.Tensor and cuda.FloatTenson mismatch

I am getting this error while running the training code of a model.
Traceback (most recent call last):
File "train.py", line 273, in <module>
train_loss[epoch - 1] = process_epoch(
File "train.py", line 240, in process_epoch
loss = loss_fn(model, batch)
File "train.py", line 221, in <lambda>
loss_fn = lambda model, batch: weak_loss(model, batch, normalization="softmax")
File "train.py", line 171, in weak_loss
corr4d = model(batch).to("cuda")
File "/home/srtf/anaconda3/envs/ncnet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/home/srtf/ncnet/lib/model.py", line 263, in forward
feature_A = self.FeatureExtraction(tnf_batch['source_image'])
File "/home/srtf/anaconda3/envs/ncnet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/home/srtf/ncnet/lib/model.py", line 84, in forward
features = self.model(image_batch)
File "/home/srtf/anaconda3/envs/ncnet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/home/srtf/anaconda3/envs/ncnet/lib/python3.8/site-packages/torch/nn/modules/container.py", line 100, in forward
input = module(input)
File "/home/srtf/anaconda3/envs/ncnet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/home/srtf/anaconda3/envs/ncnet/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 353, in forward
return self._conv_forward(input, self.weight)
File "/home/srtf/anaconda3/envs/ncnet/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 349, in _conv_forward
return F.conv2d(input, weight, self.bias, self.stride,
RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same
Cuda is there on the system. Where do I need to make changes in the code?

Your input needs to be sent to the correct device:
>>> corr4d = model(batch.cuda())
Which will copy the batch to the GPU device ('cuda:0' by default).

ValueError: A `Concatenate` layer should be called on a list of at least 2 inputs

I'm trying to use a sigmoid to join the output of two models with different embedding matrix. but I keep getting the error at the concatenate line. I have tried other suggestions from similar questions but it keeps giving the same error. I feel I'm missing something but I can't find it. please help explain. Thanks
############################ MODEL 1 ######################################
input_tensor=Input(shape=(35,))
input_layer= Embedding(vocab_size, 300, input_length=35, weights=[embedding_matrix],trainable=True)(input_tensor)
conv_blocks = []
filter_sizes = (2,3,4)
for fx in filter_sizes:
conv_layer= Conv1D(100, kernel_size=fx, activation='relu', data_format='channels_first')(input_layer) #filters=100, kernel_size=3
maxpool_layer = MaxPooling1D(pool_size=4)(conv_layer)
flat_layer= Flatten()(maxpool_layer)
conv_blocks.append(flat_layer)
conc_layer=concatenate(conv_blocks, axis=1)
graph = Model(inputs=input_tensor, outputs=conc_layer)
model = Sequential()
model.add(graph)
model.add(Dropout(0.2))
############################ MODEL 2 ######################################
input_tensor_1=Input(shape=(35,))
input_layer_1= Embedding(vocab_size, 300, input_length=35, weights=[embedding_matrix_1],trainable=True)(input_tensor_1)
conv_blocks_1 = []
filter_sizes_1 = (2,3,4)
for fx in filter_sizes_1:
conv_layer_1= Conv1D(100, kernel_size=fx, activation='relu', data_format='channels_first')(input_layer_1) #filters=100, kernel_size=3
maxpool_layer_1 = MaxPooling1D(pool_size=4)(conv_layer_1)
flat_layer_1= Flatten()(maxpool_layer_1)
conv_blocks_1.append(flat_layer_1)
conc_layer_1=concatenate(conv_blocks_1, axis=1)
graph_1 = Model(inputs=input_tensor_1, outputs=conc_layer_1)
model_1 = Sequential()
model_1.add(graph_1)
model_1.add(Dropout(0.2))
fused = concatenate([graph, graph_1], axis=-1)
prediction = Dense(3, activation='sigmoid')(fused)
model = Model(inputs=[input_tensor,input_tensor_1], outputs=[prediction])
model.compile(loss='sparse_categorical_crossentropy',optimizer='Adagrad', metrics=['accuracy'])
model.summary()
This is the error trace
Traceback (most recent call last):
File "DL_Ensemble.py", line 145, in <module>
fused = concatenate([graph, graph_1], axis= 1 )
File "/usr/pkg/lib/python3.8/site-
packages/tensorflow_core/python/keras/layers/merge.py", line 705, in concatenate
return Concatenate(axis=axis, **kwargs)(inputs)
File "/usr/pkg/lib/python3.8/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 887, in __call__
self._maybe_build(inputs)
File "/usr/pkg/lib/python3.8/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 2141, in _maybe_build
self.build(input_shapes)
File "/usr/pkg/lib/python3.8/site-
packages/tensorflow_core/python/keras/utils/tf_utils.py", line 306, in wrapper
output_shape = fn(instance, input_shape)
File "/usr/pkg/lib/python3.8/site-
packages/tensorflow_core/python/keras/layers/merge.py", line 378, in build
raise ValueError('A `Concatenate` layer should be called '
ValueError: A `Concatenate` layer should be called on a list of at least 2 inputs
UPDATE: I have reflected the answers given by #VivekMehta, however, I have this error.
File "DL_Ensemble.py", line 165, in <module>
model.fit([train_sequences,train_sequences], train_y, epochs=10,
verbose=False, batch_size=32, class_weight={0: 6.0, 1: 1.0, 2: 2.0})
File "/usr/pkg/lib/python3.8/site-
packages/tensorflow_core/python/keras/engine/training.py", line 709, in fit
return func.fit(
File "/usr/pkg/lib/python3.8/site-
packages/tensorflow_core/python/keras/engine/training_v2.py", line 313, in fit
training_result = run_one_epoch(
File "/usr/pkg/lib/python3.8/site-
packages/tensorflow_core/python/keras/engine/training_v2.py", line 123, in run_one_epoch
batch_outs = execution_function(iterator)
File "/usr/pkg/lib/python3.8/site-
packages/tensorflow_core/python/keras/engine/training_v2_utils.py",
line
86, in execution_function
distributed_function(input_fn))
File "/usr/pkg/lib/python3.8/site-
packages/tensorflow_core/python/eager/def_function.py", line 457, in __call__
result = self._call(*args, **kwds)
File "/usr/pkg/lib/python3.8/site-
packages/tensorflow_core/python/eager/def_function.py", line 520, in _call
return self._stateless_fn(*args, **kwds)
File "/usr/pkg/lib/python3.8/site-
packages/tensorflow_core/python/eager/function.py", line 1823, in __call__
return graph_function._filtered_call(args, kwargs) # pylint:
disable=protected-access
File "/usr/pkg/lib/python3.8/site-
packages/tensorflow_core/python/eager/function.py", line 1137, in _filtered_call
return self._call_flat(
File "/usr/pkg/lib/python3.8/site-
packages/tensorflow_core/python/eager/function.py", line 1223, in _call_flat
flat_outputs = forward_function.call(
File "/usr/pkg/lib/python3.8/site-
packages/tensorflow_core/python/eager/function.py", line 506, in call
outputs = execute.execute(
File "/usr/pkg/lib/python3.8/site-
packages/tensorflow_core/python/eager/execute.py", line 67, in quick_execute
six.raise_from(core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError:
Conv2DCustomBackpropInputOp only supports NHWC.
[[node Conv2DBackpropInput (defined at /usr/pkg/lib/python3.8/site-
packages/tensorflow_core/python/framework/ops.py:1751) ]] [Op:__inference_distributed_function_2250]
Function call stack:
distributed_function
I also wanted to add that when the code is run on a GPU as opposed to a CPU, the error occurs on the same line as before but the message changes to :
File "DL_Ensemble.py", line 166, in <module>
model.fit([train_sequences,train_sequences], train_y, epochs=10, verbose=False, batch_size=32, class_weight={0: 6.0, 1: 1.0, 2: 2.0})
File "/home/kosimadukwe/.local/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 880, in fit
validation_steps=validation_steps)
File "/home/kosimadukwe/.local/lib/python3.7/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 329, in model_iteration
batch_outs = f(ins_batch)
File "/home/kosimadukwe/.local/lib/python3.7/site-packages/tensorflow/python/keras/backend.py", line 3073, in __call__
self._make_callable(feed_arrays, feed_symbols, symbol_vals, session)
File "/home/kosimadukwe/.local/lib/python3.7/site-packages/tensorflow/python/keras/backend.py", line 3019, in _make_callable
callable_fn = session._make_callable_from_options(callable_opts)
File "/home/kosimadukwe/.local/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1471, in _make_callable_from_options
return BaseSession._Callable(self, callable_options)
File "/home/kosimadukwe/.local/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1425, in __init__
session._session, options_ptr, status)
File "/home/kosimadukwe/.local/lib/python3.7/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Conv2DCustomBackpropInputOp only supports NHWC.
[[{{node training/Adagrad/gradients/conv1d_5/conv1d/Conv2D_grad/Conv2DBackpropInput}}]]
Exception ignored in: <function BaseSession._Callable.__del__ at 0x7fe4dd06a730>
Traceback (most recent call last):
File "/home/kosimadukwe/.local/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1455, in __del__
self._session._session, self._handle, status)
File "/home/kosimadukwe/.local/lib/python3.7/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: No such callable handle: 94697914208640

So from you stack trace, code is throwing error at:
fused = concatenate([graph, graph_1], axis= 1 )
print(type(graph))
# output: <class 'tensorflow.python.keras.engine.training.Model'>
This error is coming because concatenate expects list of tensors to be concatenated. While you are passing graph and graph_1 which is not tensor but a Model instance.
So from your code I assume that you want to concatenate output of these two models. In that case you'll have to change above line to:
fused = concatenate([graph.outputs[0], graph_1.outputs[0]], axis=-1)
Here, graph.outputs gives list of outputs by given by Model. Since each model is giving us one output, we will take 0th index from each output.
Change this part and you'll get model summary as you are expecting.

Tensorflow InvalidArgumentError: indices[40] = 2000 0 is not in [0, 20000)

I was runnig this code (https://github.com/monkut/tensorflow_chatbot main code in execute.py) on my Windows7 with python 3.5 and tensorflow r0.12 cpu and an error occured after just 300 steps. Then I tried to change the vocabulary size to 30000 and set a checkpiont every 100 steps. With 1 layer of 128 units the error occured after 3900 steps and with 3 layers of 256 units it occured after 5400 steps.
What kind of error is that? Is there a way to solve it?
Error:
>> Mode : train
Preparing data in working_dir/
Creating vocabulary working_dir/vocab20000.enc from data/train.enc
processing line 100000
>> Full Vocabulary Size : 45408
>>>> Vocab Truncated to: 20000
Creating vocabulary working_dir/vocab20000.dec from data/train.dec
processing line 100000
>> Full Vocabulary Size : 44271
>>>> Vocab Truncated to: 20000
Tokenizing data in data/train.enc
tokenizing line 100000
Tokenizing data in data/train.dec
tokenizing line 100000
Tokenizing data in data/test.enc
Creating 3 layers of 256 units.
Created model with fresh parameters.
Reading development and training data (limit: 0).
reading data line 100000
global step 300 learning rate 0.5000 step-time 3.34 perplexity 377.45
eval: bucket 0 perplexity 96.25
eval: bucket 1 perplexity 210.94
eval: bucket 2 perplexity 267.86
eval: bucket 3 perplexity 365.77
Traceback (most recent call last):
File "C:\Python35 64\lib\site-packages\tensorflow\python\client\session.py", l
ine 1021, in _do_call
return fn(*args)
File "C:\Python35 64\lib\site-packages\tensorflow\python\client\session.py", l
ine 1003, in _run_fn
status, run_metadata)
File "C:\Python35 64\lib\contextlib.py", line 66, in __exit__
next(self.gen)
File "C:\Python35 64\lib\site-packages\tensorflow\python\framework\errors_impl
.py", line 469, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[40] = 2000
0 is not in [0, 20000)
[[Node: model_with_buckets/sequence_loss_3/sequence_loss_by_example/sam
pled_softmax_loss_28/embedding_lookup_1 = Gather[Tindices=DT_INT64, Tparams=DT_F
LOAT, _class=["loc:#proj_b"], validate_indices=true, _device="/job:localhost/rep
lica:0/task:0/cpu:0"](proj_b/read, model_with_buckets/sequence_loss_3/sequence_l
oss_by_example/sampled_softmax_loss_28/concat)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "execute.py", line 352, in <module>
train()
File "execute.py", line 180, in train
target_weights, bucket_id, False)
File "C:\Users\Администратор\Downloads\tensorflow_chatbot-master (1)\tensorflo
w_chatbot-master\seq2seq_model.py", line 230, in step
outputs = session.run(output_feed, input_feed)
File "C:\Python35 64\lib\site-packages\tensorflow\python\client\session.py", l
ine 766, in run
run_metadata_ptr)
File "C:\Python35 64\lib\site-packages\tensorflow\python\client\session.py", l
ine 964, in _run
feed_dict_string, options, run_metadata)
File "C:\Python35 64\lib\site-packages\tensorflow\python\client\session.py", l
ine 1014, in _do_run
target_list, options, run_metadata)
File "C:\Python35 64\lib\site-packages\tensorflow\python\client\session.py", l
ine 1034, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[40] = 2000
0 is not in [0, 20000)
[[Node: model_with_buckets/sequence_loss_3/sequence_loss_by_example/sam
pled_softmax_loss_28/embedding_lookup_1 = Gather[Tindices=DT_INT64, Tparams=DT_F
LOAT, _class=["loc:#proj_b"], validate_indices=true, _device="/job:localhost/rep
lica:0/task:0/cpu:0"](proj_b/read, model_with_buckets/sequence_loss_3/sequence_l
oss_by_example/sampled_softmax_loss_28/concat)]]
Caused by op 'model_with_buckets/sequence_loss_3/sequence_loss_by_example/sample
d_softmax_loss_28/embedding_lookup_1', defined at:
File "execute.py", line 352, in <module>
train()
File "execute.py", line 148, in train
model = create_model(sess, False)
File "execute.py", line 109, in create_model
gConfig['learning_rate_decay_factor'], forward_only=forward_only)
File "C:\Users\Администратор\Downloads\tensorflow_chatbot-master (1)\tensorflo
w_chatbot-master\seq2seq_model.py", line 158, in __init__
softmax_loss_function=softmax_loss_function)
File "C:\Python35 64\lib\site-packages\tensorflow\python\ops\seq2seq.py", line
1130, in model_with_buckets
softmax_loss_function=softmax_loss_function))
File "C:\Python35 64\lib\site-packages\tensorflow\python\ops\seq2seq.py", line
1058, in sequence_loss
softmax_loss_function=softmax_loss_function))
File "C:\Python35 64\lib\site-packages\tensorflow\python\ops\seq2seq.py", line
1022, in sequence_loss_by_example
crossent = softmax_loss_function(logit, target)
File "C:\Users\Администратор\Downloads\tensorflow_chatbot-master (1)\tensorflo
w_chatbot-master\seq2seq_model.py", line 101, in sampled_loss
self.target_vocab_size)
File "C:\Python35 64\lib\site-packages\tensorflow\python\ops\nn.py", line 1412
, in sampled_softmax_loss
name=name)
File "C:\Python35 64\lib\site-packages\tensorflow\python\ops\nn.py", line 1184
, in _compute_sampled_logits
all_b = embedding_ops.embedding_lookup(biases, all_ids)
File "C:\Python35 64\lib\site-packages\tensorflow\python\ops\embedding_ops.py"
, line 110, in embedding_lookup
validate_indices=validate_indices)
File "C:\Python35 64\lib\site-packages\tensorflow\python\ops\gen_array_ops.py"
, line 1293, in gather
validate_indices=validate_indices, name=name)
File "C:\Python35 64\lib\site-packages\tensorflow\python\framework\op_def_libr
ary.py", line 759, in apply_op
op_def=op_def)
File "C:\Python35 64\lib\site-packages\tensorflow\python\framework\ops.py", li
ne 2240, in create_op
original_op=self._default_original_op, op_def=op_def)
File "C:\Python35 64\lib\site-packages\tensorflow\python\framework\ops.py", li
ne 1128, in __init__
self._traceback = _extract_stack()
InvalidArgumentError (see above for traceback): indices[40] = 20000 is not in [0
, 20000)
[[Node: model_with_buckets/sequence_loss_3/sequence_loss_by_example/sam
pled_softmax_loss_28/embedding_lookup_1 = Gather[Tindices=DT_INT64, Tparams=DT_F
LOAT, _class=["loc:#proj_b"], validate_indices=true, _device="/job:localhost/rep
lica:0/task:0/cpu:0"](proj_b/read, model_with_buckets/sequence_loss_3/sequence_l
oss_by_example/sampled_softmax_loss_28/concat)]]

It seems using virtualenv and tensorflow-gpu 0.12.0 solves the problem for me.

The notation [) means Inclusive Exclusive in interval notation.
[ means including that number. ( means excluding that number.
the same goes for right parentheses and brackets ie ] & ). For example [0,20000)
means from Zero inclusive to 20000 not inclusive. Brackets mean "Yes include this" parenthesis mean "no, don't go all the way up to this number"

Memory error on Random Forest Classifier prediction

I have fitted a Random Forest Classifier on my dataset containing 7 features and about 1 million rows or records.
Following is my code.
randForestClassifier=RandomForestClassifier(n_estimators=10,max_depth=3)
randForestClassifier.fit(X_train,y)
pred=randForestClassifier.predict(featues_test)
I am getting Memory error when I use predict method of my classifier.How to fix it?
Following is my complete log
randForestClassifier.predict(featues_test)
Traceback (most recent call last):
File "<ipython-input-15-0b7612d6e958>", line 1, in <module>
randForestClassifier.predict(featues_test)
File "C:\Python27\lib\site-packages\sklearn\ensemble\forest.py", line 462, in predict
proba = self.predict_proba(X)
File "C:\Python27\lib\site-packages\sklearn\ensemble\forest.py", line 513, in predict_proba
for e in self.estimators_)
File "C:\Python27\lib\site-packages\sklearn\externals\joblib\parallel.py", line 659, in __call__
self.dispatch(function, args, kwargs)
File "C:\Python27\lib\site-packages\sklearn\externals\joblib\parallel.py", line 406, in dispatch
job = ImmediateApply(func, args, kwargs)
File "C:\Python27\lib\site-packages\sklearn\externals\joblib\parallel.py", line 140, in __init__
self.results = func(*args, **kwargs)
File "C:\Python27\lib\site-packages\sklearn\ensemble\forest.py", line 106, in _parallel_helper
return getattr(obj, methodname)(*args, **kwargs)
File "C:\Python27\lib\site-packages\sklearn\tree\tree.py", line 592, in predict_proba
proba = self.tree_.predict(X)
File "sklearn/tree/_tree.pyx", line 3207, in sklearn.tree._tree.Tree.predict (sklearn\tree\_tree.c:24468)
File "sklearn/tree/_tree.pyx", line 3209, in sklearn.tree._tree.Tree.predict (sklearn\tree\_tree.c:24340)
MemoryError

Yes, you are getting the MemoryError at randForestClassifier.predict(featues_test), as shown by the stack trace:
File "<ipython-input-15-0b7612d6e958>", line 1, in <module>
randForestClassifier.predict(featues_test)
The remaining lines of the stack trace shows that the problems comes from sklearn, in the C code: sklearn\tree\_tree.c:24340

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED - python

Installing torch with CUDA 11.1 with the following command did fix the initial issue with torch 1.8: pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html

Related

TensorFlow Model is throwing an Invalid Argument Error at the 1st Epoch

I am getting this error of Float.Tensor and cuda.FloatTenson mismatch

ValueError: A `Concatenate` layer should be called on a list of at least 2 inputs

Tensorflow InvalidArgumentError: indices[40] = 2000 0 is not in [0, 20000)

Memory error on Random Forest Classifier prediction

Categories

Resources