I have a pre-trained model which I load like so:
from transformers import BertForSequenceClassification, AdamW, BertConfig, BertModel
model = BertForSequenceClassification.from_pretrained(
"bert-base-uncased", # Use the 12-layer BERT model, with an uncased vocab.
num_labels = 2, # The number of output labels--2 for binary classification.
# You can increase this for multi-class tasks.
output_attentions = False, # Whether the model returns attentions weights.
output_hidden_states = False, # Whether the model returns all hidden-states.
I want to create a new model with the same architecture, and random initial weights, except for the embedding layer:
==== Embedding Layer ====
bert.embeddings.word_embeddings.weight (30522, 768)
bert.embeddings.position_embeddings.weight (512, 768)
bert.embeddings.token_type_embeddings.weight (2, 768)
bert.embeddings.LayerNorm.weight (768,)
bert.embeddings.LayerNorm.bias (768,)
It seems I can do this to create a new model with the same architecture, but then all the weights are random:
configuration = model.config
untrained_model = BertForSequenceClassification(configuration)
So how do I copy over model's embedding layer weights to the new untrained_model?
Weights and bias are just tensor and you can simply copy them with copy_:
from transformers import BertForSequenceClassification, BertConfig
jetfire = BertForSequenceClassification.from_pretrained('bert-base-cased')
config = BertConfig.from_pretrained('bert-base-cased')
optimus = BertForSequenceClassification(config)
parts = ['bert.embeddings.word_embeddings.weight'
def joltElectrify (jetfire, optimus, parts):
target = dict(optimus.named_parameters())
source = dict(jetfire.named_parameters())
for part in parts:
joltElectrify(jetfire, optimus, parts)
I want to use the pre-trained models in Pytorch to do image classification in my own datasets, but how should I change the number of classes while freezing the parameters of the feature extraction layer?
These are the models I want to include:
resnet18 = models.resnet18(pretrained=True)
densenet161 = models.densenet161(pretrained=True)
inception_v3 = models.inception_v3(pretrained=True)
shufflenet_v2_x1_0 = models.shufflenet_v2_x1_0(pretrained=True)
mobilenet_v3_large = models.mobilenet_v3_large(pretrained=True)
mobilenet_v3_small = models.mobilenet_v3_small(pretrained=True)
mnasnet1_0 = models.mnasnet1_0(pretrained=True)
resnext50_32x4d = models.resnext50_32x4d(pretrained=True)
vgg16 = models.vgg16(pretrained=True)
Thanks a lot in advance!
New codes I added:
import torch
from torchvision import models
class MyResModel(torch.nn.Module):
def __init__(self):
super(MyResModel, self).__init__()
self.classifier = nn.Sequential(
def forward(self, x):
return self.classifier(x)
resnet18 = models.resnet18(pretrained=True)
resnet18.fc = MyResModel()
for param in resnet18.parameters():
You have to change the final Linear layer of the respective model.
For example in the case of resnet, when we print the model, we see that the last layer is a fully connected layer as shown below:
(fc): Linear(in_features=512, out_features=1000, bias=True)
Thus, you must reinitialize model.fc to be a Linear layer with 512 input features and 2 output features with:
model.fc = nn.Linear(512, num_classes)
For other models you can check here
To freeze the parameters of the network you have to use the following code:
for name, param in model.named_parameters():
if 'fc' not in name:
print(name, param.requires_grad)
To validate:
for name, param in model.named_parameters():
Note that for this example 'fc' was the name of the classification layer. This is not the case for other models. You have to inspect the model in order to find the name of the classification layer.
I am making a CNN model to use for lane detection. But tensorflow 2 does not have tf.contrib therefore i cannot access the fully_connected layer.
How can I make my own Fully connected layer function?
This is my model so far:
conv2d = tf.nn.conv2d
batch_norm = tf.nn.batch_normalization
dropout = tf.nn.dropout
max_pool = tf.nn.max_pool2d
softmax = tf.nn.softmax
relu = tf.nn.relu
avg_pool = tf.nn.avg_pool2d
checkpoint = tf.train.Checkpoint
def network(x):
model = conv2d(x,filters=[1,5,5,1],strides=[1,2,2,1],padding='SAME')
model = relu(model)
model = batch_norm(model)
model = max_pool(model)
model = conv2d(model,filters=[1,4,4,1],strides=[1,2,2,1],padding='SAME')
model = relu(model)
model = batch_norm(model)
model = max_pool(model)
model = conv2d(model,filters=[1,3,3,1],strides=[1,2,2,1],padding='SAME')
model = relu(model)
model = batch_norm(model)
model = avg_pool(model)
model = dropout(model,0.3)
# i want to add the fully connect layer here then a softmax layer then another fully connected
I think what you might be looking for is the Dense layer in the keras module - https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense
I want to run a seq2seq model using lstm for a customer journey analysis.I am able to run the model but unable to load the saved model on a different notebook.
Code for attention model is here:
# RNN "Cell" classes in Keras perform the actual data transformations at each timestep. Therefore, in order to add attention to LSTM, we need to make a custom subclass of LSTMCell.
class AttentionLSTMCell(LSTMCell):
def __init__(self, **kwargs):
self.attentionMode = False
super(AttentionLSTMCell, self).__init__(**kwargs)
# Build is called to initialize the variables that our cell will use. We will let other Keras
# classes (e.g. "Dense") actually initialize these variables.
def build(self, input_shape):
# Converts the input sequence into a sequence which can be matched up to the internal
# hidden state.
self.dense_constant = TimeDistributed(Dense(self.units, name="AttLstmInternal_DenseConstant"))
# Transforms the internal hidden state into something that can be used by the attention
# mechanism.
self.dense_state = Dense(self.units, name="AttLstmInternal_DenseState")
# Transforms the combined hidden state and converted input sequence into a vector of
# probabilities for attention.
self.dense_transform = Dense(1, name="AttLstmInternal_DenseTransform")
# We will augment the input into LSTMCell by concatenating the context vector. Modify
# input_shape to reflect this.
batch, input_dim = input_shape[0]
batch, timesteps, context_size = input_shape[-1]
lstm_input = (batch, input_dim + context_size)
# The LSTMCell superclass expects no constant input, so strip that out.
return super(AttentionLSTMCell, self).build(lstm_input)
# This must be called before call(). The "input sequence" is the output from the
# encoder. This function will do some pre-processing on that sequence which will
# then be used in subsequent calls.
def setInputSequence(self, input_seq):
self.input_seq = input_seq
self.input_seq_shaped = self.dense_constant(input_seq)
self.timesteps = tf.shape(self.input_seq)[-2]
# This is a utility method to adjust the output of this cell. When attention mode is
# turned on, the cell outputs attention probability vectors across the input sequence.
def setAttentionMode(self, mode_on=False):
self.attentionMode = mode_on
# This method sets up the computational graph for the cell. It implements the actual logic
# that the model follows.
def call(self, inputs, states, constants):
# Separate the state list into the two discrete state vectors.
# ytm is the "memory state", stm is the "carry state".
ytm, stm = states
# We will use the "carry state" to guide the attention mechanism. Repeat it across all
# input timesteps to perform some calculations on it.
stm_repeated = K.repeat(self.dense_state(stm), self.timesteps)
# Now apply our "dense_transform" operation on the sum of our transformed "carry state"
# and all encoder states. This will squash the resultant sum down to a vector of size
# [batch,timesteps,1]
# Note: Most sources I encounter use tanh for the activation here. I have found with this dataset
# and this model, relu seems to perform better. It makes the attention mechanism far more crisp
# and produces better translation performance, especially with respect to proper sentence termination.
combined_stm_input = self.dense_transform(
keras.activations.relu(stm_repeated + self.input_seq_shaped))
# Performing a softmax generates a log probability for each encoder output to receive attention.
score_vector = keras.activations.softmax(combined_stm_input, 1)
# In this implementation, we grant "partial attention" to each encoder output based on
# it's log probability accumulated above. Other options would be to only give attention
# to the highest probability encoder output or some similar set.
context_vector = K.sum(score_vector * self.input_seq, 1)
# Finally, mutate the input vector. It will now contain the traditional inputs (like the seq2seq
# we trained above) in addition to the attention context vector we calculated earlier in this method.
inputs = K.concatenate([inputs, context_vector])
# Call into the super-class to invoke the LSTM math.
res = super(AttentionLSTMCell, self).call(inputs=inputs, states=states)
# This if statement switches the return value of this method if "attentionMode" is turned on.
return (K.reshape(score_vector, (-1, self.timesteps)), res[1])
return res
# Custom implementation of the Keras LSTM that adds an attention mechanism.
# This is implemented by taking an additional input (using the "constants" of the RNN class into the LSTM: The encoder output vectors across the entire input sequence.
class LSTMWithAttention(RNN):
def __init__(self, units, **kwargs):
cell = AttentionLSTMCell(units=units)
self.units = units
super(LSTMWithAttention, self).__init__(cell, **kwargs)
def build(self, input_shape):
self.input_dim = input_shape[0][-1]
self.timesteps = input_shape[0][-2]
return super(LSTMWithAttention, self).build(input_shape)
# This call is invoked with the entire time sequence. The RNN sub-class is responsible
# for breaking this up into calls into the cell for each step.
# The "constants" variable is the key to our implementation. It was specifically added
# to Keras to accomodate the "attention" mechanism we are implementing.
def call(self, x, constants, **kwargs):
if isinstance(x, list):
self.x_initial = x[0]
self.x_initial = x
# The only difference in the LSTM computational graph really comes from the custom
# LSTM Cell that we utilize.
self.cell._dropout_mask = None
self.cell._recurrent_dropout_mask = None
return super(LSTMWithAttention, self).call(inputs=x, constants=constants, **kwargs)
Code defining encoder and decoder model:
# Encoder Layers
encoder_inputs = Input(shape=(None,len_input), name="attenc_inputs")
encoder = LSTM(units=units, return_sequences=True, return_state=True)
encoder_outputs, state_h, state_c = encoder((encoder_inputs))
encoder_states = [state_h, state_c]
#define inference decoder
encoder_model = Model(encoder_inputs, encoder_states)
# define training decoder
decoder_inputs = Input(shape=(None, n_output))
Attention_dec_lstm = LSTMWithAttention(units=units, return_sequences=True, return_state=True)
# Note that the only real difference here is that we are feeding attenc_outputs to the decoder now.
attdec_lstm_out, _, _ = Attention_dec_lstm(inputs=decoder_inputs,
decoder_dense1 = Dense(units, activation="relu")
decoder_dense2 = Dense(n_output, activation='softmax')
decoder_outputs = decoder_dense2(Dropout(rate=.10)(decoder_dense1(Dropout(rate=.10)(attdec_lstm_out))))
atten_model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
atten_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
#Defining inference decoder
state_input_h = Input(shape=(units,), name="state_input_h")
state_input_c = Input(shape=(units,), name="state_input_c")
decoder_states_inputs = [state_input_h, state_input_c]
attenc_seq_out = Input(shape=encoder_outputs.get_shape()[1:], name="attenc_seq_out")
inf_attdec_inputs = Input(shape=(None,n_output), name="inf_attdec_inputs")
attdec_res, attdec_h, attdec_c = Attention_dec_lstm(inputs=inf_attdec_inputs,
decoder_states = [attdec_h, attdec_c]
decoder_model = Model(inputs=[inf_attdec_inputs, state_input_h, state_input_c, attenc_seq_out],
outputs=[attdec_res, attdec_h, attdec_c])
Code for model fit and save:
history = atten_model.fit([encoder_input_data, decoder_input_data], decoder_target_data,
Code to load the encoder decoder model with custom Attention layer:
with open('atten_model_lstm.json') as mdl:
json_string = mdl.read()
model = model_from_json(json_string, custom_objects={'AttentionLSTMCell': AttentionLSTMCell, 'LSTMWithAttention': LSTMWithAttention})
This code to load is giving error :
TypeError: int() argument must be a string, a bytes-like object or a number, not 'AttentionLSTMCell'
Here's a solution inspired by the link in my comment:
# serialize model to JSON
atten_model_json = atten_model.to_json()
with open("atten_model.json", "w") as json_file:
# serialize weights to HDF5
print("Saved model to disk")
# Different part of your code or different file
# load json and create model
json_file = open('atten_model.json', 'r')
loaded_model_json = json_file.read()
loaded_model = model_from_json(loaded_model_json)
# load weights into new model
print("Loaded model from disk")
I've trained and saved my model successfully (image classifier) using tensorflow so now I have the .meta , index and checkpoint files.
I wanted to feed my model an image for testing so I create another .py file and restored my model :
with tf.Session() as sess:
saver = tf.train.import_meta_graph('model-24900.meta')
graph = tf.get_default_graph()
after that I tried to feed an image :
Prediction=sess.run([output],feed_dict={input_img : testImage,})
the problem that 'output' and input_img are defined in another file( where I've constructed and trained the model so they are undefined in the file where I want to test the model.
This is how I wrote in my train file:
with tf.name_scope("Input") as scope:
input_img = tf.placeholder(dtype='float', shape=[None, 128, 128, 1], name="input")
with tf.name_scope("Target") as scope:
target_labels = tf.placeholder(dtype='float', shape=[None, 2], name="Targets")
nb = NetworkBuilder()
with tf.name_scope("ModelV2") as scope:
model = input_img
model = nb.attach_conv_layer(model, 32)
model = nb.attach_relu_layer(model)
model = nb.attach_conv_layer(model, 32)
model = nb.attach_relu_layer(model)
model = nb.attach_pooling_layer(model)
model = nb.attach_conv_layer(model, 64)
model = nb.attach_relu_layer(model)
model = nb.attach_conv_layer(model, 64)
model = nb.attach_relu_layer(model)
model = nb.attach_pooling_layer(model)
model = nb.attach_conv_layer(model, 128)
model = nb.attach_relu_layer(model)
model = nb.attach_conv_layer(model, 128)
model = nb.attach_relu_layer(model)
model = nb.attach_pooling_layer(model)
model = nb.flatten(model)
model = nb.attach_dense_layer(model, 200)
model = nb.attach_sigmoid_layer(model)
model = nb.attach_dense_layer(model, 32)
model = nb.attach_sigmoid_layer(model)
model = nb.attach_dense_layer(model, 2)
output = nb.attach_softmax_layer(model)
with tf.name_scope("Optimization") as scope:
global_step = tf.Variable(0, name='global_step', trainable=False)
cost = tf.nn.softmax_cross_entropy_with_logits(logits=model, labels=target_labels)
cost = tf.reduce_mean(cost)
tf.summary.scalar("cost", cost)
optimizer = tf.train.AdamOptimizer().minimize(cost,global_step=global_step)
with tf.name_scope('accuracy') as scope:
correct_pred = tf.equal(tf.argmax(output, 1), tf.argmax(target_labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
So my question is how can I define 'output' and 'input_img' (that I've used in train file) in my test file so I can feed an image to my CNN model
1] Testing by .ckpt file and recreating the whole model:
you can redefine the whole model with input tensors and output tensors that you did during training in the .py file that you are using in the training
Now you need to make a forward pass by giving the test image to the input tensor that you define and get the predictions from the output tensor but no training should be preformed
The model that you define in the test .py file has to have the same structure that was used in the training
2] Testing by .ckpt file and using the name of the tensors:
You can explicitly name tensors in the training
During testing .py file you can easily get the input and output tensor by .get_tensor_by_name("example:0") function that you explicitly named during training
And you can use these tensors in the sess.run where you will feed data to the input tensor and get the prediction
3] Testing by .pb frozen graph and using the name of the tensors:
The above two methods are still trainable and the file size of these is larger then the .pb file
.pb frozen file is a freeze graph which is not trainable
You can use this file to import you frozen graph
Now you can get the input and output tensors by .get_tensor_by_name("example:0") function and make predictions
There are two ways you can know the name of the tensor:
1] Using Tensor-Board:
Save your model after training
Open the terminal and run the command tensorboard --logdir="path_where_you_have_stored_ckpt_file"
open into your web browser
Go to graphs section and identify the name of the tensor by clicking on that particular tensor node
2] In the code:
Every Tensor has name property
You can do something like this:
for node in graph_def.node:
If you want name of a particular tensor you can do something like this:
x = tf.placeholder(tf.float32, [None, 784])
Once you have the name you can easily use the name to retrieve the tensor from by using : .get_tensor_by_name(x.name)
I would like to include my custom pre-processing logic in my exported Keras model for use in Tensorflow Serving.
My pre-processing performs string tokenization and uses an external dictionary to convert each token to an index for input to the Embedding layer:
from keras.preprocessing import sequence
token_to_idx_dict = ... #read from file
# Custom Pythonic pre-processing steps on input_data
tokens = [tokenize(s) for s in input_data]
token_idxs = [[token_to_idx_dict[t] for t in ts] for ts in tokens]
tokens_padded = sequence.pad_sequences(token_idxs, maxlen=maxlen)
Model architecture and training:
model = Sequential()
model.add(Embedding(max_features, 128, input_length=maxlen))
model.add(LSTM(128, activation='sigmoid'))
model.add(Dense(n_classes, activation='softmax'))
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam')
model.fit(x_train, y_train)
Since the model will be used in Tensorflow Serving, I want to incorporate all pre-processing logic into the model itself (encoded in the exported model file).
Q: How can I do so using the Keras library only?
I found this guide explains how to combine Keras and Tensorflow. But I'm still unsure how to export everything as one model.
I know Tensorflow has built-in string splitting, file I/O, and dictionary lookup operations.
Pre-processing logic using Tensorflow operations:
# Get input text
input_string_tensor = tf.placeholder(tf.string, shape={1})
# Split input text by whitespace
splitted_string = tf.string_split(input_string_tensor, " ")
# Read index lookup dictionary
token_to_idx_dict = tf.contrib.lookup.HashTable(tf.contrib.lookup.TextFileInitializer("vocab.txt", tf.string, 0, tf.int64, 1, delimiter=","), -1)
# Convert tokens to indexes
token_idxs = token_to_idx_dict.lookup(splitted_string)
# Pad zeros to fixed length
token_idxs_padded = tf.pad(token_idxs, ...)
Q: How can I use these Tensorflow pre-defined pre-processing operations and my Keras layers together to both train and then export the model as a "black box" for use in Tensorflow Serving?
I figured it out, so I'm going to answer my own question here.
Here's the gist:
First, (in separate code file) I trained the model using Keras only with my own pre-processing functions, exported the Keras model weights file and my token-to-index dictionary.
Then, I copied just the Keras model architecture, set the input as the pre-processed tensor output, loaded the weights file from the previously trained Keras model, and sandwiched it between the Tensorflow pre-processing operations and the Tensorflow exporter.
Final product:
import tensorflow as tf
from keras import backend as K
from keras.models import Sequential, Embedding, LSTM, Dense
from tensorflow.contrib.session_bundle import exporter
from tensorflow.contrib.lookup import HashTable, TextFileInitializer
# Initialize Keras with Tensorflow session
sess = tf.Session()
# Token to index lookup dictionary
token_to_idx_path = '...'
token_to_idx_dict = HashTable(TextFileInitializer(token_to_idx_path, tf.string, 0, tf.int64, 1, delimiter='\t'), 0)
maxlen = ...
# Pre-processing sub-graph using Tensorflow operations
input = tf.placeholder(tf.string, name='input')
sparse_tokenized_input = tf.string_split(input)
tokenized_input = tf.sparse_tensor_to_dense(sparse_tokenized_input, default_value='')
token_idxs = token_to_idx_dict.lookup(tokenized_input)
token_idxs_padded = tf.pad(token_idxs, [[0,0],[0,maxlen]])
token_idxs_embedding = tf.slice(token_idxs_padded, [0,0], [-1,maxlen])
# Initialize Keras model
model = Sequential()
e = Embedding(max_features, 128, input_length=maxlen)
model.add(LSTM(128, activation='sigmoid'))
model.add(Dense(num_classes, activation='softmax'))
# Load weights from previously trained Keras model
weights_path = '...'
# Export model in Tensorflow format
# (Official tutorial: https://github.com/tensorflow/serving/blob/master/tensorflow_serving/g3doc/serving_basic.md)
saver = tf.train.Saver(sharded=True)
model_exporter = exporter.Exporter(saver)
signature = exporter.classification_signature(input_tensor=model.input, scores_tensor=model.output)
model_exporter.init(sess.graph.as_graph_def(), default_graph_signature=signature)
model_dir = '...'
model_version = 1
model_exporter.export(model_dir, tf.constant(model_version), sess)
# Input example
with sess.as_default():
sess.run(model.output, feed_dict={input: ["this is a raw input example"]})
The accepted answer is super helpful, however it uses an outdated Keras API as #Qululu mentioned, and an outdated TF Serving API (Exporter), and it does not show how to export the model so that its input is the original tf placeholder (versus Keras model.input, which is post preprocessing). Following is a version that works well as of TF v1.4 and Keras 2.1.2:
sess = tf.Session()
K._LEARNING_PHASE = tf.constant(0)
max_features = 5000
max_lens = 500
dict_table = tf.contrib.lookup.HashTable(tf.contrib.lookup.TextFileInitializer("vocab.txt",tf.string, 0, tf.int64, TextFileIndex.LINE_NUMBER, vocab_size=max_features, delimiter=" "), 0)
x_input = tf.placeholder(tf.string, name='x_input', shape=(None,))
sparse_tokenized_input = tf.string_split(x_input)
tokenized_input = tf.sparse_tensor_to_dense(sparse_tokenized_input, default_value='')
token_idxs = dict_table.lookup(tokenized_input)
token_idxs_padded = tf.pad(token_idxs, [[0,0],[0, max_lens]])
token_idxs_embedding = tf.slice(token_idxs_padded, [0,0], [-1, max_lens])
model = Sequential()
model.add(InputLayer(input_tensor=token_idxs_embedding, input_shape=(None, max_lens)))
x_info = tf.saved_model.utils.build_tensor_info(x_input)
y_info = tf.saved_model.utils.build_tensor_info(model.output)
prediction_signature = tf.saved_model.signature_def_utils.build_signature_def(inputs={"text": x_info}, outputs={"prediction":y_info}, method_name=tf.saved_model.signature_constants.PREDICT_METHOD_NAME)
builder = saved_model_builder.SavedModelBuilder("/path/to/model")
legacy_init_op = tf.group(tf.tables_initializer(), name='legacy_init_op')
init_op = tf.group(tf.global_variables_initializer(), tf.local_variables_initializer())
# Add the meta_graph and the variables to the builder
sess, [tag_constants.SERVING],
UPDATE Doing pre-processing for inference with Tensorflow is a CPU op, and is not carried out efficiently if the model is deployed on a GPU server. The GPU stalls really bad, and the throughput is very low. Therefore, we ditched this for efficient pre-processing in the client process, instead.