I was wondering if it would be possible to extract the last cell state of an LSTM in Keras after training the model. For example, in this simple LSTM model:
number_of_dimensions = 128
number_of_examples = 123456
input_ = Input(shape = (10,100,))
lstm, hidden, cell = CuDNNLSTM(units = number_of_dimensions, return_state=True)(input_)
dense = Dense(num_of_classes, activation='softmax')(lstm)
model = Model(inputs = input_, outputs = dense)
parallel_model = multi_gpu_model(model, gpus=2)
parallel_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['acc'])
# fit the model
parallel_model.fit(X1, onehot_encoded, epochs=100, verbose=1, batch_size = 128, validation_split = 0.2)
I tried printing 'cell' but the result was
tf.Tensor 'cu_dnnlstm_2/strided_slice_17:0' shape=(?, 128) dtype=float32
I would like to get the cell state as a numpy array of shape (number_of_examples, number_of_dimensions) or (123456, 128). Is it possible to do this keras?
Thank you!
Assuming that you are using TensorFlow as a backend, you could specifically run cell within the TensorFlow session. For example:
from keras.layers import LSTM, Input, Dense
from keras.models import Model
import keras.backend as K
import numpy as np
number_of_dimensions = 128
number_of_examples = 123456
input_ = Input(shape=(10, 100,))
lstm, hidden, cell = LSTM(units=number_of_dimensions, return_state=True)(input_)
dense = Dense(10, activation='softmax')(lstm)
model = Model(inputs=input_, outputs=dense)
with K.get_session() as sess:
x = np.zeros((number_of_examples, 10, 100))
cell_state = sess.run(cell, feed_dict={input_: x})
print(cell_state.shape)
An option that you might be interested in is to save model weights to hdf5 file:
model.save_weights('my_model_weights.h5')
(ref: https://keras.io/getting-started/faq/#how-can-i-save-a-keras-model)
Then use an HDF viewer such as the Java HDFView package: https://support.hdfgroup.org/products/java/hdfview/
I believe that you can then export the data to CSV for import into Numpy for example.
Related
I'm using keras and functional API to create an encoder decoder model containing 2 LSTM layers each for binary classification.
The shape of input to encoder x is (samples, time steps, in_features) = (126144, 1, 113)
The shape of labels y is (samples, time steps, out_features) =(126144, 1, 2)
x and y are both numpy arrays.
import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input
from tensorflow.keras.layers import LSTM
from tensorflow.keras.layers import Dense
from numpy import array
from numpy import array_equal
from tensorflow.keras.layers import Lambda
from tensorflow.keras import backend as K
n_timesteps_in = 1
n_features = 113
out_features = 2
numberOfLSTMunits = 256
def create_hard_coded_decoder_input_model(batch_size):
# The first part is encoder
encoder_inputs = Input(shape=(n_timesteps_in, n_features), name='encoder_inputs')
encoder_lstm = LSTM(numberOfLSTMunits, return_state=True,return_sequences=True,
name='encoder_lstm')
encoder_outputs, state_h1, state_c1 = encoder_lstm(encoder_inputs)
# Second LSTM Added
encoder_lstm2 = LSTM(numberOfLSTMunits, return_state=True, name='encoder_lstm2')
_, state_h2, state_c2 = encoder_lstm2(encoder_outputs)
states = [state_h1, state_c1, state_h2, state_c2]
decoder_inputs = Input(shape=(1, out_features), name='decoder_inputs')
decoder_lstm = LSTM(numberOfLSTMunits, return_sequences=True, return_state=True,
name='decoder_lstm')
# Second LSTM
decoder_lstm2 = LSTM(numberOfLSTMunits, return_sequences=True, return_state=True,
name='decoder_lstm2')
decoder_dense = Dense(out_features, activation='softmax', name='decoder_dense')
# New input decoder
all_outputs = []
decoder_input_data = np.zeros((batch_size, 1, out_features))
decoder_input_data[:, 0, 0] = -1
inputs = decoder_input_data
states1 = [state_h1, state_c1]
states2 = [state_h2, state_c2]
for _ in range(n_timesteps_in):
# Run the decoder on one time step
outputs, dh1, dc1 = decoder_lstm(inputs,initial_state= states1)
final, dh2, dc2 = decoder_lstm2(outputs, initial_state=states2)
outputs = decoder_dense(final)
# Store the current prediction (we will concatenate all predictions later)
all_outputs.append(outputs)
# Reinject the outputs as inputs for the next loop iteration
# as well as update the states
inputs = outputs
states1 = [state_h1, state_c1]
states2 = [state_h2, state_c2]
decoder_outputs = Lambda(lambda x: K.concatenate(x, axis=1))(all_outputs)
model = Model(encoder_inputs, decoder_outputs, name='model_encoder_decoder')
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
return model
I'm using 192 as the batch size.
After training, I save the model with this code:
model.save('lstm.h5')
When I load the model:
savedModel=load_model('lstm.h5')
I get this error:
/usr/local/lib/python3.7/dist-packages/keras/layers/recurrent.py in get_input_spec(shape)
547 batch_index, time_step_index = (1, 0) if self.time_major else (0, 1)
548 if not self.stateful:
--> 549 input_spec_shape[batch_index] = None
550 input_spec_shape[time_step_index] = None
551 return InputSpec(shape=tuple(input_spec_shape))
IndexError: list assignment index out of range
I've trying to solve the problem for days, but nothing seemed to work. I really appreciate any help with it.
I'm creating an Ensemble of Vgg19, DenseNet, and EfficientNetB1.
The code is as follows:
IMAGE_SIZE = (224,224,3)
import tensorflow as tf
vgg19 = tf.keras.applications.vgg19.VGG19(
input_shape=IMAGE_SIZE, weights='imagenet', include_top=False)
for layer in vgg19.layers:
layer._name = layer._name + str('_19')
layer.trainable = False
effnetb1 =tf.keras.applications.efficientnet.EfficientNetB1(
include_top=False, weights='imagenet', input_shape=IMAGE_SIZE)
for layer in effnetb1.layers:
layer._name = layer._name + str('_B1')
layer.trainable=False
densenet=tf.keras.applications.densenet.DenseNet121(
include_top=False, weights="imagenet", input_shape=IMAGE_SIZE)
for layer in densenet.layers:
layer._name = layer._name + str('_Dense')
layer.trainable=False
from keras.layers import Input, Flatten, Concatenate, Dense, Average, Dropout
inp = Input(IMAGE_SIZE)
vgg19_x = Flatten()(vgg19(inp))
vgg19_x = Dense(256, activation='relu')(vgg19_x)
effnet_x = Flatten()(effnetb1(inp))
effnet_x = Dense(256, activation='relu')(effnet_x)
densenet_x = Flatten()(densenet(inp))
densenet_x = Dense(256, activation='relu')(densenet_x)
from keras.models import Model
x = Concatenate()([vgg19_x, effnet_x, densenet_x])
x = Dense(128, activation='relu')(x)
x = Dropout(0.30)(x)
x = Dense(64, activation='relu')(x)
out = Dense(2, activation='softmax')(x)
model = Model(inputs = inp, outputs = out)
model.compile(
loss='categorical_crossentropy',
optimizer=tf.keras.optimizers.Adam(
learning_rate=0.0005,
name="Adam"),
metrics=['accuracy']
)
model.summary()
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
checkpointer = ModelCheckpoint(filepath="/content/drive/MyDrive/ensemble/ensemble-weights.hdf5", verbose=1, save_best_only=True)
r = model.fit(
training_set,
validation_data=test_set,
epochs=30,
steps_per_epoch=len(training_set),
validation_steps=len(test_set),
callbacks = [checkpointer]
)
The code runs fine and the training is successfully taking place when I'm not using the callback. But when I use a ModelCheckpoint, I get the following error after 1st epoch:
ValueError: The target structure is of type `<class 'keras.engine.keras_tensor.KerasTensor'>`
KerasTensor(type_spec=TensorSpec(shape=(None, 224, 224, 3), dtype=tf.float32, name='input_5'), name=...
However, the input structure is a sequence (<class 'list'>) of length 0.
[]
nest cannot guarantee that it is safe to map one to the other.
Can anyone tell me what's wrong here? Also, is it because I'm concatenating three models?
Your help will be appreciated. Thank you!
I also ran into this issue while trying to implement a nested model (which is what would be constructed here after you create the concatenated model).
The issue seems to be that Keras cannot handle the inputs and outputs of nested models in newer tensorflow versions(tf 2.0 and above). Depending on the version you are on, you might want to either explicitly refer the input/output of the nested model you are using. In tf2.6, what seems to work is to define separate models for each part - ie - the common layers added after concatenation should also be wrapped in a model like below (taken from here):
#Make GradCAM heatmap following the Keras tutorial.
last_conv_layer = model.layers[-4].layers[-1]
last_conv_layer_model = keras.Model(model.layers[-4].inputs, last_conv_layer.output)
# Second, we create a model that maps the activations of the last conv
# layer to the final class predictions
classifier_input = keras.Input(shape=last_conv_layer.output.shape[1:])
x = classifier_input
for layer in model.layers[-3:]:
x = layer(x)
classifier_model = keras.Model(classifier_input, x)
#Preparing the image with the preprocessing layers
preprocess_layers = keras.Model(model.inputs, model.layers[-5].output)
img_array = preprocess_layers(prepared_image)
# Then, we compute the gradient of the top predicted class for our input image
# with respect to the activations of the last conv layer
with tf.GradientTape() as tape:
# Compute activations of the last conv layer and make the tape watch it
last_conv_layer_output = last_conv_layer_model(img_array)
tape.watch(last_conv_layer_output)
# Compute class predictions
preds = classifier_model(last_conv_layer_output)
top_pred_index = tf.argmax(preds[0])
top_class_channel = preds[:, top_pred_index]
# This is the gradient of the top predicted class with regard to
# the output feature map of the last conv layer
grads = tape.gradient(top_class_channel, last_conv_layer_output)
You can also check the following github issues (they are not very related, but deal with a similar problem) - issue1, issue2, issue3
I am trying to run a deep learning code that I found in a tutorial in order to familiarise myself with resnet50, keras and tensorflow with python 3.7. When I run my code, I get the following error:
TypeError: Cannot convert a symbolic Keras input/output to a numpy array. This error may indicate that you're trying to pass a symbolic value to a NumPy call, which is not supported. Or, you may be trying to pass Keras symbolic inputs/outputs to a TF API that does not register dispatching, preventing Keras from automatically converting the API call to a lambda layer in the Functional Model.
I tried to use the following fix as mentioned on stack overflow:
from tensorflow.python.framework.ops import disable_eager_execution
disable_eager_execution()
Without any success. My full code can be seen below:
from keras.applications.resnet50 import ResNet50
from keras.layers import Dense, GlobalAveragePooling2D
from keras.models import Model
from keras.optimizers import SGD
from keras.preprocessing.image import ImageDataGenerator
import numpy as np
from keras.preprocessing import image
from sklearn.linear_model import LogisticRegression
from tensorflow.python.framework.ops import disable_eager_execution
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
# Download the architecture of ResNet50 with ImageNet weights
base_model = ResNet50(include_top=False, weights='imagenet')
# Taking the output of the last convolution block in ResNet50
x = base_model.output
# Adding a Global Average Pooling layer
x = GlobalAveragePooling2D()(x)
# Adding a fully connected layer having 1024 neurons
x = Dense(1024, activation='relu')(x)
# Adding a fully connected layer having 2 neurons which will
# give probability of image having either dog or cat
predictions = Dense(2, activation='softmax')(x)
# Model to be trained
model = Model(inputs=base_model.input, outputs=predictions)
# Training only top layers i.e. the layers which we have added in the end
for layer in base_model.layers:
layer.trainable = False
# Compiling the model
model.compile(optimizer=SGD(lr=0.0001, momentum=0.9), loss='categorical_crossentropy', metrics = ['accuracy'],
experimental_run_tf_function=False)
# Creating objects for image augmentations
train_datagen = ImageDataGenerator(rescale = 1./255,
shear_range = 0.2,
zoom_range = 0.2,
horizontal_flip = True)
test_datagen = ImageDataGenerator(rescale = 1./255)
# Proving the path of training and test dataset
# Setting the image input size as (224, 224)
# We are using class mode as binary because there are only two classes in our data
training_set = train_datagen.flow_from_directory('training_set',
target_size = (224, 224),
batch_size = 32,
class_mode = 'categorical')
test_set = test_datagen.flow_from_directory('test_set',
target_size = (224, 224),
batch_size = 32,
class_mode = 'categorical')
# Training the model for 5 epochs
model.fit_generator(training_set,
steps_per_epoch = 8000,
epochs = 5,
validation_data = test_set,
validation_steps = 2000)
# We will try to train the last stage of ResNet50
for layer in base_model.layers[0:143]:
layer.trainable = False
for layer in base_model.layers[143:]:
layer.trainable = True
# Training the model for 10 epochs
model.fit_generator(training_set,
steps_per_epoch = 8000,
epochs = 10,
validation_data = test_set,
validation_steps = 2000)
# Saving the weights in the current directory
model.save_weights("resnet50_weights.h5")
# Predicting the final result of image
test_image = image.load_img('cat_or_dog_test.jpg', target_size = (224, 224))
test_image = image.img_to_array(test_image)\
# Expanding the 3-d image to 4-d image.
# The dimensions will be Batch, Height, Width, Channel
test_image = np.expand_dims(test_image, axis = 0)
# Predicting the final class
classifier = LogisticRegression()
result = classifier.predict(test_image)
# Fetching the class labels
labels = training_set.class_indices
labels = list(labels.items())
# Printing the final label
for label, i in labels:
if i == result:
print("The test image has: ", label)
break
I had the same problem when using: from keras import Input;
But, when I change to: from tensorflow.keras import Input, it works!
I assume that the following line is where the error occurs:
test_image = np.expand_dims(test_image, axis = 0)
The reason is probably that you try to apply a numpy function to a tensor. Don't do that. Either convert your tensor to numpy or use a function that work on tensors. Normally, I'd say prefer the second option over the first one (it will prevent unnecessary conversions and make your code more efficient). In your case you will need to convert your tensor to numpy because you are using sklearn afterward:
test_image = np.expand_dims(test_image.numpy(), axis=0)
I am new to DL and I received a similar error a nd the following has helped me.
Try:
del base_model
Before:
base_model = ResNet50(include_top=False, weights='imagenet')
and also simultaneously:
Try:
del model
Before:
model = Model(inputs=base_model.input, outputs=predictions)
Please let me know if this has helped you or hasn't :) .
Try using tensorflow.keras.something instead of keras.something.
It worked for me.
Ofcourse you have to also import tensorlfow
I am used to working in PyTorch but now have to learn Tensorflow for my job. I am trying to get up to speed by creating a simple dense network and training it on the MNIST dataset, but I cannot get it to train. My super simple code:
import tensorflow as tf
from tensorflow.keras import Input
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.utils import to_categorical
# Load mnist data from keras
(train_data, train_label), (test_data, test_label) = tf.keras.datasets.mnist.load_data(path="mnist.npz")
train_label, test_label = to_categorical(train_label), to_categorical(test_label)
train_data, train_label, test_data, test_label = Flatten()(train_data), Flatten()(train_label), Flatten()(test_data), Flatten()(test_label)
# Create generic SGD optimizer (no learning schedule)
optimizer = SGD(learning_rate = 0.01)
# Define function to build and compile model
def build_mnist_model(input_shape, batch_size = 30):
input_img = Input(shape = input_shape, batch_size = batch_size)
# Pass through dense layer
x = Dense(200, activation = 'relu', use_bias = True)(input_img)
x = Dense(400, activation = 'relu', use_bias = True)(x)
scores = Dense(10, activation = 'softmax', use_bias = True)(x)
# Create and compile tf model
mnist_model = Model(input_img, scores)
mnist_model.compile(optimizer = optimizer, loss = 'categorical_crossentropy')
return mnist_model
# Build the model
mnist_model = build_mnist_model(train_data[0].shape)
# Train the model
mnist_model.fit(
x = train_data,
y = train_label,
batch_size = 30,
epochs = 20,
verbose = 2,
shuffle = True,
# steps_per_epoch = 200
)
When I run this I get
ValueError: When using data tensors as input to a model, you should specify the `steps_per_epoch` argument.
This does not really make sense to me because my train_data and train_label are just regular tensors and per the Tensorflow documentation in this case it should default to the number of samples in the dataset divided by the batch size (which would be 200 in my case).
At any rate, I tried specifying steps_per_epoch = 200 when I call mnist_model.fit() but then I get a different error:
InvalidArgumentError: Incompatible shapes: [60000,10] vs. [30,1]
[[{{node training_4/SGD/gradients/gradients/loss_5/dense_17_loss/softmax_cross_entropy_with_logits_grad/mul}}]]
I can't seem to discern where a size mismatch would come from. In PyTorch, I am used to manually creating batches (by subindexing my data and label tensors) but in Tensorflow this seems to happen automatically. As such, this leaves me quite confused about what batch has the wrong size, how it got the wrong size, etc. I hope this simple model is way easier than I am making it and I just do not know the Tensorflow tricks yet.
Thanks for the help.
This is my simple reproducible code:
from keras.callbacks import ModelCheckpoint
from keras.models import Model
from keras.models import load_model
import keras
import numpy as np
SEQUENCE_LEN = 45
LATENT_SIZE = 20
VOCAB_SIZE = 100
inputs = keras.layers.Input(shape=(SEQUENCE_LEN, VOCAB_SIZE), name="input")
encoded = keras.layers.Bidirectional(keras.layers.LSTM(LATENT_SIZE), merge_mode="sum", name="encoder_lstm")(inputs)
decoded = keras.layers.RepeatVector(SEQUENCE_LEN, name="repeater")(encoded)
decoded = keras.layers.Bidirectional(keras.layers.LSTM(VOCAB_SIZE, return_sequences=True), merge_mode="sum", name="decoder_lstm")(decoded)
autoencoder = keras.models.Model(inputs, decoded)
autoencoder.compile(optimizer="sgd", loss='mse')
autoencoder.summary()
x = np.random.randint(0, 90, size=(10, SEQUENCE_LEN,VOCAB_SIZE))
y = np.random.normal(size=(10, SEQUENCE_LEN, VOCAB_SIZE))
NUM_EPOCHS = 1
checkpoint = ModelCheckpoint(filepath='checkpoint/{epoch}.hdf5')
history = autoencoder.fit(x, y, epochs=NUM_EPOCHS,callbacks=[checkpoint])
and here is my code to have a look at the weights in the encoder layer:
for epoch in range(1, NUM_EPOCHS + 1):
file_name = "checkpoint/" + str(epoch) + ".hdf5"
lstm_autoencoder = load_model(file_name)
encoder = Model(lstm_autoencoder.input, lstm_autoencoder.get_layer('encoder_lstm').output)
print(encoder.output_shape[1])
weights = encoder.get_weights()[0]
print(weights.shape)
for idx in range(encoder.output_shape[1]):
token_idx = np.argsort(weights[:, idx])[::-1]
here print(encoder.output_shape) is (None,20) and print(weights.shape) is (100, 80).
I understand that get_weight will print the weight transition after the layer.
The part I did not get based on this architecture is 80. what is it?
And, are the weights here the weight that connect the encoder layer to the decoder? I meant the connection between encoder and the decoder.
I had a look at this question here. as it is only simple dense layers I could not connect the concept to the seq2seq model.
Update1
What is the difference between:
encoder.get_weights()[0] and encoder.get_weights()[1]?
the first one is (100,80) and the second one is (20,80) like conceptually?
any help is appreciated:)
The encoder as you have defined it is a model, and it consists of two layers: an input layer and the 'encoder_lstm' layer which is the bidirectional LSTM layer in the autoencoder. So its output shape would be the output shape of 'encoder_lstm' layer which is (None, 20) (because you have set LATENT_SIZE = 20 and merge_mode="sum"). So the output shape is correct and clear.
However, since encoder is a model, when you run encoder.get_weights() it would return the weights of all the layers in the model as a list. The bidirectional LSTM consists of two separate LSTM layers. Each of those LSTM layers has 3 weights: the kernel, the recurrent kernel and the biases. So encoder.get_weights() would return a list of 6 arrays, 3 for each of the LSTM layers. The first element of this list, as you have stored in weights and is subject of your question, is the kernel of one of the LSTM layers. The kernel of an LSTM layer has a shape of (input_dim, 4 * lstm_units). The input dimension of 'encoder_lstm' layer is VOCAB_SIZE and its number of units is LATENT_SIZE. Therefore, we have (VOCAB_SIZE, 4 * LATENT_SIZE) = (100, 80) as the shape of kernel.