I am trying to convert a saved model in tensorflow 1 to tensorflow 2. I am migrating the code to tensorflow 2, as higlighted in tensorflow docs. However, I would like to simply update my model_weights.ckpt to tensorflow 2. Some weights (Linear, Embdedding) have a similar shape to tensorflow 2 syntax, but I am struggling to transform the weights from my GRUCell.
How to convert the GRUCell weights from compat.v1.nn.rnn_cell.GRUCell to keras.layers.GRUCell ?
The GRUCell has four weights:
gru_cell/gates/kernel:0 of shape (S + H, 2 x H),
gru_cell/gates/bias:0 of shape (2 x H, ),
gru_cell/candidate/kernel:0 of shape (S + H, H),
gru_cell/candidate/bias:0 of shape (H, )
I would like to have weights with a similar shape to tensoflow 2 API (or PyTorch API), i.e. a GRUCell with the following weights:
gru_cell/kernel:0 of shape (S, 3 x H)
gru_cell/recurrent_kernel:0 of shape (H, 3 x H)
gru_cell/bias:0 of shape (2, 3 x H)
To illustrate, you can reproduce these results:
1. GRUCell with tensorflow 1 API
import tensorflow as tf
SEQ_LENGTH = 4
HIDDEN_SIZE = 512
BATCH_SIZE = 1
inputs = tf.random.normal([BATCH_SIZE, SEQ_LENGTH])
# GRU cell
gru = tf.compat.v1.nn.rnn_cell.GRUCell(HIDDEN_SIZE)
# Hidden state
state = gru.zero_state(BATCH_SIZE, tf.float32)
# Forward
output, state = gru(inputs, state)
for weight in gru.weights:
print(weight.name, weight.shape)
Output:
gru_cell/gates/kernel:0 (516, 1024)
gru_cell/gates/bias:0 (1024,)
gru_cell/candidate/kernel:0 (516, 512)
gru_cell/candidate/bias:0 (512,)
2. GRUCell with tensorflow 2 API
import tensorflow as tf
SEQ_LENGTH = 4
HIDDEN_SIZE = 512
BATCH_SIZE = 1
inputs = tf.random.normal([BATCH_SIZE , SEQ_LENGTH])
# GRU cell
gru = tf.keras.layers.GRUCell(HIDDEN_SIZE)
# Hidden state
state = tf.zeros((BATCH_SIZE, HIDDEN_SIZE), dtype=tf.float32)
# Forward
output, state = gru(inputs, state)
# Display the weigths
for weight in gru.weights:
print(weight.name, weight.shape)
Output:
gru_cell/kernel:0 (4, 1536)
gru_cell/recurrent_kernel:0 (512, 1536)
gru_cell/bias:0 (2, 1536)
Note
I tried _convert_rnn_weights tensorflow function to convert the desired weights. It works but only for CuDNN weights, so I can't use it in my case.
For the benefit of community providing solution here though it is presented in Github.
In short, the weights between compat.v1.nn.rnn_cell.GRUCell and keras.layers.GRUCell are not compatible between each other. We don't have a function to convert between them, and if you really want to do it, you will need to do it manually.
Math wise, if you have the numpy value of the v1 weights, the formula are:
B = batch_size
H = state_size
all_kernel = np.concat([gru_cell/gates/kernel, gru_cell/candidate/kernel], axis=1) # shape (B+H, 3 * H)
kernel = all_kernel[:B] # shape(B, 3 * H)
recurrent_kernel = all_kernel[B:] # shape (H, 3 * H)
bias = np.concat([gru_cell/gates/bias, gru_cell/candidate/bias], axis=0) # shape (B, 3 * H)
zero_bias = np.zeros([B, 3 * H])
bias = np.concat([bias, zero_bias], axis=0)
Related
I have a custom loss where I want to apply Gaussian filter to a predicted label to manipulate it a little. Using max or average pooling is simple as it is predefined in keras, but I had to make my own class for Gaussian pooling:
import numpy as np
from keras.layers import DepthwiseConv2D
from keras.layers import Input
from keras.models import Model
import tensorflow as tf
class Gaussian():
def __init__(self,shape, f = 3):
self.filt = f
self.g = self.gaussFilter(shape)
def doFilter(self, data):
return self.g.predict(data, steps=1) #steps are for predicting on const tensor, I change it when predicting on predictions
def gauss2D(self,shape=(3,3),sigma=0.5):
m,n = [(ss-1.)/2. for ss in shape]
y,x = np.ogrid[-m:m+1,-n:n+1]
h = np.exp( -(x*x + y*y) / (2.*sigma*sigma) )
h[ h < np.finfo(h.dtype).eps*h.max() ] = 0
sumh = h.sum()
if sumh != 0:
h /= sumh
return h
def gaussFilter(self, size=256):
kernel_weights = self.gauss2D(shape=(self.filt,self.filt))
in_channels = 1 # the number of input channels
kernel_weights = np.expand_dims(kernel_weights, axis=-1)
kernel_weights = np.repeat(kernel_weights, in_channels, axis=-1) # apply the same filter on all the input channels
kernel_weights = np.expand_dims(kernel_weights, axis=-1) # for shape compatibility reasons
inp = Input(shape=(size,size,1))
g_layer = DepthwiseConv2D(self.filt, use_bias=False, padding='same')(inp)
model_network = Model(input=inp, output=g_layer)
print(model_network.summary())
model_network.layers[1].set_weights([kernel_weights])
model_network.trainable= False
return model_network
This works as expected when feeding a constant tensor to the doFilter function, an example of simple data:
a = np.array([[[1, 2, 3], [4, 5, 6], [4, 5, 6]]])
filt = Gaussian(3)
print(filt.doFilter(tf.constant(a.reshape(1,3,3,1))))
However, if I try to use this in a custom loss :
def custom_loss_no_true(input_tensor, length):
def loss(y_true, y_pred):
gaus_pooler = Gaussian(256, length//8)
a = gaus_pooler.doFilter(y_pred)
...more stuff comes after
I get an error:
ValueError: When feeding symbolic tensors to a model, we expect the
tensors to have a static batch size. Got tensor with shape: (None,
256, 256, 1)
This is as I have found caused by the fact, that I am feeding a tensor that is an output of other model, a symbolic data, not actual values (source). Thus I need to change the logic of my approach, because evaluating the tensor to feed my class would break the graph and lead to no gradient propagation within the loss (or am I incorrect?). How can I apply such convolution operation on a tensor that is an output of other model? Is it even possible? Or maybe there is a way to use it without adding the layer to the model, such as MaxPooling?
You don't really need a complex keras Model nor a keras Layer if what you want to do is just convolve your input with a Gaussian kernel. Here is a port of your code with simple tensorflow ops :
import tensorflow as tf
def get_gaussian_kernel(shape=(3,3), sigma=0.5):
"""build the gaussain filter"""
m,n = [(ss-1.)/2. for ss in shape]
x = tf.expand_dims(tf.range(-n,n+1,dtype=tf.float32),1)
y = tf.expand_dims(tf.range(-m,m+1,dtype=tf.float32),0)
h = tf.exp(tf.math.divide_no_nan(-((x*x) + (y*y)), 2*sigma*sigma))
h = tf.math.divide_no_nan(h,tf.reduce_sum(h))
return h
def gaussian_blur(inp, shape=(3,3), sigma=0.5):
"""Convolve using tf.nn.depthwise_conv2d"""
in_channel = tf.shape(inp)[-1]
k = get_gaussian_kernel(shape,sigma)
k = tf.expand_dims(k,axis=-1)
k = tf.repeat(k,in_channel,axis=-1)
k = tf.reshape(k, (*shape, in_channel, 1))
# using padding same to preserve size (H,W) of the input
conv = tf.nn.depthwise_conv2d(inp, k, strides=[1,1,1,1],padding="SAME")
return conv
You can use it simply in your custom loss (assuming a 4D y_pred [batch, height width, channel]) :
a = gaussian_blur(y_pred)
I am trying to efficiently implement the following kind of conv2d layer. The current implementation I believe works but is very inefficient.
Input tensor of size
(batch_size x W x H x C_in)
Output tensor
(batch_size x W x H x C_out)
The layer takes two parameter, number of units (C_u), and a list of K conv kernels (known ahead of time). Each conv kernel is size (W,H,1,N) where N is the number of out channels (in channels being 1). Note, different kernels in the same like have different Ns!
First we apply a densely connected layer (trainable) transforming input shape to
(batch_size x W x H x C_u)
Then, I want to apply each of the convolutional kernels to each of the channels.
This results in C_u x K x (batch_size x W x H x N)
I then want to take a max along N (so I get (batch_size x W x H x 1)) and concatenate everything to get
(batch_size x W x H x (C_u x K))
(so C_out = C_u x K)
Here is one way to implement this but training time is super slow and this does not play well with being put on the GPU:
import tensorflow as tf
from tensorflow.keras import layers
class fixedConvLayer(layers.Dense):
def __init__(self, units, conv_kernels, **params):
params['units']=units
self.conv_kernels_numpy = conv_kernels
super().__init__(**params)
return
def build(self, input_shape):
super().build(input_shape)
self.conv_kernels = [tf.convert_to_tensor(np.reshape(kernels,[3,3,1,-1]))
for kernels in self.conv_kernels_numpy]
return
def comp_filters(self,channel):
return tf.concat([
tf.math.reduce_max(tf.nn.conv2d(channel,
filter=kernel,
strides=1,
padding='SAME'),axis=3,keepdims=True)
for kernel in self.conv_kernels],axis=3)
def call(self,inputs):
#take from Dense definition and slightly modify
inputs = tf.convert_to_tensor(inputs)
rank = tf.rank(inputs)
if rank != 4:
assert 'Rank expected to be 4'
# Broadcasting is required for the inputs.
outputs = tf.tensordot(inputs, self.kernel, [[3], [0]])
# Reshape the output back to the original ndim of the input.
shape = inputs.shape.as_list()
output_shape = shape[:-1] + [self.units]
outputs.set_shape(output_shape)
if self.use_bias:
outputs = tf.nn.bias_add(outputs, self.bias)
if self.activation is not None:
outputs = self.activation(outputs)
#apply the conv filters
channel_list = tf.split(outputs,num_or_size_splits= self.units,axis = -1)
max_layers = tf.concat([self.comp_filters(channel) for channel in channel_list],axis=3)
return max_layers
I am using python 3 with anaconda, and keras with over tensorflow, My goal is to create a network with a Conv layer of variable input size
I found here to use this code
i = Input((None, None, 1))
o = Conv2D(1, 3, 3)(i)
model = Model(i, o)
model.compile('sgd', 'mse')
I have used it to create my own model with this code (I need a flatten layer)
model = Sequential()
I = Input((None, None, 1))
c = Conv2D(filters=1, kernel_size=(1, 1))(I)
f = Flatten()(c)
o = Dense(10, activation="softmax")(f)
m = Model(I, o)
m.compile(loss=categorical_crossentropy, optimizer=SGD(), metrics=["accuracy"])
And I keep getting this error
ValueError: The shape of the input to "Flatten" is not fully defined
(got (None, None, 1). Make sure to pass a complete "input_shape" or
"batch_input_shape" argument to the first layer in your model.
Seems like the issue is with the input shape for the Flatten layer, When I remove it, it's fine.
How can I make it play well with the variable size?
Thanks
Dense needs fix-sized inputs/outputs because the number of weights variables of it must be fixed.
There are two solutions in your case.
To use GAP(Global Average Pooling) instead of Flatten. GAP's outputs size is the number of channels of the previous layer. so, its size is fixed in your case.
To employ the all convolution net that doesn't have dense layer. In this case, the output of the net is two dimensional, not one. So the size of y should be that size.
below was added for Allen M's request.
Here is a code sample:
# The original number of Conv filters are one.
# But I set it 16 to depict how GAP works.
# And B/H/W means BatchSize/Height/Width.
#1. using GAP
I = Input((None, None, 1)) # output shape=(B, H(None), W(None), 1)
c = Conv2D(filters=16, kernel_size=(1, 1))(I) # output shape=(B, H, W, 16)
f = GlobalAveragePooling2D()(c) # output shape=(B, 16) <- space data(H/W) are aggregated by average
o = Dense(10, activation="softmax")(f) # output shape = (B, 10)
m = Model(I, o)
#2. all conv
I = Input((None, None, 1)) # output shape=(B, H, W, 1)
c = Conv2D(filters=16, kernel_size=(1, 1))(I) # output shape=(B, H, W, 16)
o = Conv2D(filters=10, kernel_size=(1, 1), activation="softmax")(c)
# output shape=(B, H, W, 10)
m = Model(I, o)
# The output size of all conv is H * W * 10, where 10 is the number of classes.
# so the shape of y should be (B, H, W, 1) or (B, H, W) or (B, H, W, 10).
# That is pixel-wise classification or semantic segmentation.
Flatten method doesn't take input size as argument.
model = Sequential()
I = Input((None, None, 1))
c = Conv2D(filters=1, kernel_size=(1, 1))(I)
f = Flatten()
o = Dense(10, activation="softmax")(I)
m = Model(I, o)
m.compile(loss="categorical_crossentropy", optimizer=SGD(), metrics=["accuracy"])
This should solve your problem.
I think the problem is due to your variable input_sizes. It says here that you can't vary input_sizes if you're using a fully connected layer. See: How to train images, when they have different size ?
Here is what I have tried:
tf.reset_default_graph()
X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
y = tf.placeholder(tf.float32, [None,n_outputs])
layers = [tf.contrib.rnn.LSTMCell(num_units=n_neurons,
activation=tf.nn.leaky_relu, use_peepholes = True)
for layer in range(n_layers)]
multi_layer_cell = tf.contrib.rnn.MultiRNNCell(layers)
rnn_outputs, states = tf.nn.dynamic_rnn(multi_layer_cell, X, dtype=tf.float32)
tf.summary.histogram("outputs", rnn_outputs)
tf.summary.image("RNN",rnn_outputs)
I am getting the following error:
InvalidArgumentError: Tensor must be 4-D with last dim 1, 3, or 4, not [55413,4,100]
[[Node: RNN_1 = ImageSummary[T=DT_FLOAT, bad_color=Tensor<type: uint8 shape: [4] values: 255 0 0...>, max_images=3, _device="/job:localhost/replica:0/task:0/device:CPU:0"](RNN_1/tag, rnn/transpose_1)]]
Kindly, help me get the visualization of the rnn inside the LSTM model that I am trying to run. This will help in understanding what LSTM is doing more accurately.
You can plot each RNN output as an image with one axis being the time and the other axis being the output. Here is an small example:
import tensorflow as tf
import numpy as np
n_steps = 100
n_inputs = 10
n_neurons = 10
n_layers = 3
x = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
layers = [tf.contrib.rnn.LSTMCell(num_units=n_neurons,
activation=tf.nn.leaky_relu, use_peepholes=True)
for layer in range(n_layers)]
multi_layer_cell = tf.contrib.rnn.MultiRNNCell(layers)
rnn_outputs, states = tf.nn.dynamic_rnn(multi_layer_cell, x, dtype=tf.float32)
# Time steps in horizontal axis, outputs in vertical axis, add last dimension for channel
rnn_out_imgs = tf.transpose(rnn_outputs, (0, 2, 1))[..., tf.newaxis]
out_img_sum = tf.summary.image("RNN", rnn_out_imgs, max_outputs=10)
init_op = tf.global_variables_initializer()
with tf.Session() as sess, tf.summary.FileWriter('log') as fw:
sess.run(init_op)
fw.add_summary(sess.run(out_img_sum, feed_dict={x: np.random.rand(10, n_steps, n_inputs)}))
You would get a visualization that could look like this:
Here the brighter pixels would represent a stronger activation, so even if it is hard to tell what exactly is causing what you can at least see if any meaningful patterns arise.
Your RNN output has the wrong shape for tf.summary.image. The tensor should be four-dimensional with the dimensions' sizes given by [batch_size, height, width, channels].
In your code, you're calling tf.summary.image with rnn_outputs, which has shape [55413, 4, 100]. Assuming your images are 55413-by-100 pixels in size and that each pixel contains 4 channels (RGBA), I'd use tf.reshape to reshape rnn_outputs to [1, 55413, 100, 4]. Then you should be able to call tf.summary.image without error.
I don't think I can help you visualize the RNN's operation, but when I was learning about RNNs and LSTMs, I found this article very helpful.
My question is about the elemental dynamic or static rnn outputs's dimensionality.
nlu_input = tf.placeholder(tf.float32, shape=[4,1607,1])
cell = tf.nn.rnn_cell.BasicLSTMCell(80)
outts, states = tf.nn.dynamic_rnn(cell=cell, inputs=nlu_input, dtype=tf.float32)
Then tf.gloabal_valiables() returns the following list.
[<tf.Variable 'rnn/basic_lstm_cell/kernel:0' shape=(81, 320) dtype=float32_ref>,<tf.Variable 'rnn/basic_lstm_cell/bias:0' shape=(320,) dtype=float32_ref>]
I expected the tf.Variable 'rnn/basic_lstm_cell/kernel:0' shape=(80, 320), because 320 = 4*80 and the unit number is 80.
Why the dimensionality of the kernel is incremented?
According to the tensorflow implementation: BasicLSTMCell source code. The shape of kernel is [input_depth + h_depth, 4 * num_units], which input_depth is your input vector dimension, and h_depth is your hidden units count. So your kernel shape is [1 + 80, 4 * 80].