Related
I am confused on how to replicate Keras (TensorFlow) convolutions in PyTorch.
In Keras, I can do something like this. (the input size is (256, 237, 1, 21) and the output size is (256, 237, 1, 1024).
import tensorflow as tf
x = tf.random.normal((256,237,1,21))
y = tf.keras.layers.Conv1D(filters=1024, kernel_size=5,padding="same")(x)
print(y.shape)
(256, 237, 1, 1024)
However, in PyTorch, when I try to do the same thing I get a different output size:
import torch.nn as nn
x = torch.randn(256,237,1,21)
m = nn.Conv1d(in_channels=237, out_channels=1024, kernel_size=(1,5))
y = m(x)
print(y.shape)
torch.Size([256, 1024, 1, 17])
I want PyTorch to give me the same output size that Keras does:
This previous question seems to imply that Keras filters are PyTorch's out_channels but thats what I have. I tried to add the padding in PyTorch of padding=(0,503) but that gives me torch.Size([256, 1024, 1, 1023]) but that still not correct. This also takes so much longer than keras does so I feel that I have incorrectly assigned a parameter.
How can I replicate what Keras did with convolution in PyTorch?
In TensorFlow, tf.keras.layers.Conv1D takes in a tensor of shape (batch_shape + (steps, input_dim)). Which means that what is commonly known as channels appears on the last axis. For instance in 2D convolution you would have (batch, height, width, channels). This is different from PyTorch where the channel dimension is right after the batch axis: torch.nn.Conv1d takes in shapes of (batch, channel, length). So you will need to permute two axes.
For torch.nn.Conv1d:
in_channels is the number of channels in the input tensor
out_channels is the number of filters, i.e. the number of channels the output will have
stride the step size of the convolution
padding the zero-padding added to both sides
In PyTorch there is no option for padding='same', you will need to choose padding correctly. Here stride=1, so padding must equal to kernel_size//2 (i.e. padding=2) in order to maintain the length of the tensor.
In your example, since x has a shape of (256, 237, 1, 21), in TensorFlow's terminology it will be considered as an input with:
a batch shape of (256, 237),
steps=1, so the length of your 1D input is 1,
21 input channels.
Whereas in PyTorch, x of shape (256, 237, 1, 21) would be:
batch shape of (256, 237),
1 input channel
a length of 21.
Have kept the input in both examples below (TensorFlow vs. PyTorch) as x.shape=(256, 237, 21) assuming 256 is the batch size, 237 is the length of the input sequence, and 21 is the number of channels (i.e. the input dimension, what I see as the dimension on each timestep).
In TensorFlow:
>>> x = tf.random.normal((256, 237, 21))
>>> m = tf.keras.layers.Conv1D(filters=1024, kernel_size=5, padding="same")
>>> y = m(x)
>>> y.shape
TensorShape([256, 237, 1024])
In PyTorch:
>>> x = torch.randn(256, 237, 21)
>>> m = nn.Conv1d(in_channels=21, out_channels=1024, kernel_size=5, padding=2)
>>> y = m(x.permute(0, 2, 1))
>>> y.permute(0, 2, 1).shape
torch.Size([256, 237, 1024])
So in the latter, you would simply work with x = torch.randn(256, 21, 237)...
PyTorch now has out of the box same convolution operation you can take a look at this link [Same convolution][1]
class InceptionNet(nn.Module):
def __init__(self, in_channels, in_1x1, in_3x3reduce, in_3x3, in_5x5reduce, in_5x5, in_1x1pool):
super(InceptionNet, self).__init__()
self.incep_1 = ConvBlock(in_channels, in_1x1, kernel_size=1, padding='same')
Note a same convolution only supports the default stride value which is 1 anything other won't work.
[1]: https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html
At a certain stage in a resnet, I have 6 features per image i.e. each example is of shape 1X8X8X6, I want to involve each feature with 4 constant filters (DWT) of size 1X2X2X1 with a stride of 2 to get 24 features in next layer and the image to become 1X4X4X24. However, I am unable to use tf.nn.conv2d or tf.nn.convolution for this purpose, conv2d says fourth dimension of input be equal to 3rd dimension of the filter, but how can I do this, I tried doing for the first filter but even this doesn't work:
x_in = np.random.randn(1,8,8,6)
kernel_in = np.array([[[[1],[1]],[[1],[1]]]])
kernel_in.shape
x = tf.constant(x_in, dtype=tf.float32)
kernel = tf.constant(kernel_in, dtype=tf.float32)
tf.nn.convolution(x, kernel, strides=[1, 1, 1, 1], padding='VALID')
try in this way
x_in = np.random.randn(1,8,8,6) # [batch, in_height, in_width, in_channels]
kernel_in = np.ones((2,2,6,24)) # [filter_height, filter_width, in_channels, out_channels]
x = tf.constant(x_in, dtype=tf.float32)
kernel = tf.constant(kernel_in, dtype=tf.float32)
tf.nn.conv2d(x, kernel, strides=[1, 2, 2, 1], padding='VALID')
# <tf.Tensor: shape=(1, 4, 4, 24), dtype=float32, numpy=....>
A simple example of how to fill predefined values to filters in a Keras.conv2d layer in TF2:
model = models.Sequential()
# one 3x3 filter
model.add(layers.Conv2D(1, (3, 3), input_shape=(None, None, 1)))
# access to the target layer
layer = model.layers[0]
current_w, current_bias = layer.get_weights() # see the current weights
new_w = tf.constant([[1,2, 3],
[4, 5, 6],
[7, 8, 9]])
new_w = tf.reshape(new_w, custom_w.shape) # fix the shape
new_bias = tf.constant([0])
layer.set_weights([new_w, new_bias])
model.summary()
# let's see ..
tf.print(model.layers[0].get_weights())
I have a large custom model made with the new tensorflow 2.0 and mixing keras and tensorflow. I want to save it (architecture and weights).
Exact command to reproduce:
import tensorflow as tf
OUTPUT_CHANNELS = 3
def downsample(filters, size, apply_batchnorm=True):
initializer = tf.random_normal_initializer(0., 0.02)
result = tf.keras.Sequential()
result.add(
tf.keras.layers.Conv2D(filters, size, strides=2, padding='same',
kernel_initializer=initializer, use_bias=False))
if apply_batchnorm:
result.add(tf.keras.layers.BatchNormalization())
result.add(tf.keras.layers.LeakyReLU())
return result
def upsample(filters, size, apply_dropout=False):
initializer = tf.random_normal_initializer(0., 0.02)
result = tf.keras.Sequential()
result.add(
tf.keras.layers.Conv2DTranspose(filters, size, strides=2,
padding='same',
kernel_initializer=initializer,
use_bias=False))
result.add(tf.keras.layers.BatchNormalization())
if apply_dropout:
result.add(tf.keras.layers.Dropout(0.5))
result.add(tf.keras.layers.ReLU())
return result
def Generator():
down_stack = [
downsample(64, 4, apply_batchnorm=False), # (bs, 128, 128, 64)
downsample(128, 4), # (bs, 64, 64, 128)
downsample(256, 4), # (bs, 32, 32, 256)
downsample(512, 4), # (bs, 16, 16, 512)
downsample(512, 4), # (bs, 8, 8, 512)
downsample(512, 4), # (bs, 4, 4, 512)
downsample(512, 4), # (bs, 2, 2, 512)
downsample(512, 4), # (bs, 1, 1, 512)
]
up_stack = [
upsample(512, 4, apply_dropout=True), # (bs, 2, 2, 1024)
upsample(512, 4, apply_dropout=True), # (bs, 4, 4, 1024)
upsample(512, 4, apply_dropout=True), # (bs, 8, 8, 1024)
upsample(512, 4), # (bs, 16, 16, 1024)
upsample(256, 4), # (bs, 32, 32, 512)
upsample(128, 4), # (bs, 64, 64, 256)
upsample(64, 4), # (bs, 128, 128, 128)
]
initializer = tf.random_normal_initializer(0., 0.02)
last = tf.keras.layers.Conv2DTranspose(OUTPUT_CHANNELS, 4,
strides=2,
padding='same',
kernel_initializer=initializer,
activation='tanh') # (bs, 256, 256, 3)
concat = tf.keras.layers.Concatenate()
inputs = tf.keras.layers.Input(shape=[None,None,3])
x = inputs
# Downsampling through the model
skips = []
for down in down_stack:
x = down(x)
skips.append(x)
skips = reversed(skips[:-1])
# Upsampling and establishing the skip connections
for up, skip in zip(up_stack, skips):
x = up(x)
x = concat([x, skip])
x = last(x)
return tf.keras.Model(inputs=inputs, outputs=x)
generator = Generator()
generator.summary()
generator.save('generator.h5')
generator_loaded = tf.keras.models.load_model('generator.h5')
I manage to save the model with:
generator.save('generator.h5')
But when I try to load it with:
generator_loaded = tf.keras.models.load_model('generator.h5')
It never ends (no error message). Maybe the model is too large? I tried to save as JSON with model.to_json() as well as the full API tf.keras.models.save_model(), but same problem, impossible to load it (or at least far too long).
Same problem on Windows/Linux and with/without GPU.
The save and restore work well with full Keras and simple model.
Edit
Saving weights and then loading them works well, but it's impossible to load the model structure.
I put the model I use to reproduce the bug, it comes from Pix2Pix example (https://www.tensorflow.org/alpha/tutorials/generative/pix2pix)
I also wrote an issue on tensorflow github : https://github.com/tensorflow/tensorflow/issues/28281
As of tensorflow release 2.0.0 there is now a keras / tf agnostic way of saving models using tf.saved_model
....
model.fit(images, labels , epochs=30, validation_data=(images_val, labels_val), verbose=1)
tf.saved_model.save( model, "path/to/model_dir" )
You can then load with
loaded_model = tf.saved_model.load("path/to/model_dir")
Try instead to save the model as:
model.save('model_name.model')
Then Load it with:
model = tf.keras.models.load_model('model_name.model')
I found a temporary solution.
It seems that the issue occurs with the sequential API tf.keras.Sequential, by using the functional API, tf.keras.models.load_model manages to load the saved model.
I hope they will fixed this issue in the final release, have a look to the issue I raised in github https://github.com/tensorflow/tensorflow/issues/28281.
Cheers,
I managed to save and load custom models by implementing similar functions to the Sequential model in Keras.
The key functions are CustomModel.get_config() CustomModel.from_config(), which also should exist on any of your custom layers (similar to the functions below, but see keras layers if you want a better understanding):
# In the CustomModel class
def get_config(self):
layer_configs = []
for layer in self.layers:
layer_configs.append({
'class_name': layer.__class__.__name__,
'config': layer.get_config()
})
config = {
'name': self.name,
'layers': copy.deepcopy(layer_configs),
"arg1": self.arg1,
...
}
if self._build_input_shape:
config['build_input_shape'] = self._build_input_shape
return config
#classmethod
def from_config(cls, config, custom_objects=None):
from tensorflow.python.keras import layers as layer_module
if custom_objects is None:
custom_objects = {'CustomLayer1Class': CustomLayer1Class, ...}
else:
custom_objects = dict(custom_objects, **{'CustomLayer1Class': CustomLayer1Class, ...})
if 'name' in config:
name = config['name']
build_input_shape = config.get('build_input_shape')
layer_configs = config['layers']
else:
name = None
build_input_shape = None
layer_configs = config
model = cls(name=name,
arg1=config['arg1'],
should_build_graph=False,
...)
for layer_config in tqdm(layer_configs, 'Loading Layers'):
layer = layer_module.deserialize(layer_config,
custom_objects=custom_objects)
model.add(layer) # This function looks at the name of the layers to place them in the right order
if not model.inputs and build_input_shape:
model.build(build_input_shape)
if not model._is_graph_network:
# Still needs to be built when passed input data.
model.built = False
return model
I also added a CustomModel.add() function that adds layers one by one from their config. Also a parameter should_build_graph=False that makes sure you do not build the graph in the __init__() when calling cls().
Then the CustomModel.save() function looks like this:
def save(self, filepath, overwrite=True, include_optimizer=True, **kwargs):
from tensorflow.python.keras.models import save_model
save_model(self, filepath, overwrite, include_optimizer)
After that you can save using:
model.save("model.h5")
new_model = keras.models.load_model('model.h5',
custom_objects={
'CustomModel': CustomModel,
'CustomLayer1Class': CustomLayer1Class,
...
})
But somehow this approach seem to be quite slow... This approach on the other hand is almost 30x faster. Not sure why:
model.save_weights("weights.h5")
config = model.get_config()
reinitialized_model = CustomModel.from_config(config)
reinitialized_model.load_weights("weights.h5")
I works, but it seems quite hacky. Maybe future versions of TF2 will make the process clearer.
One other method of saving a trained model is to use the pickle module in python.
import pickle
pickle.dump(model, open(filename, 'wb'))
In order to load the pickled model,
loaded_model = pickle.load(open(filename, 'rb'))
The extension of the pickle file is usually .sav
I am trying to import weights saved from a Tensorflow model to PyTorch. So far the results have been very similar. I ran into a snag when the model calls for conv2d with stride=2.
To verify the mismatch, I set up a very simple comparison between TF and PyTorch. First, I compare conv2d with stride=1.
import tensorflow as tf
import numpy as np
import torch
import torch.nn.functional as F
np.random.seed(0)
sess = tf.Session()
# Create random weights and input
weights = torch.empty(3, 3, 3, 8)
torch.nn.init.constant_(weights, 5e-2)
x = np.random.randn(1, 3, 10, 10)
weights_tf = tf.convert_to_tensor(weights.numpy(), dtype=tf.float32)
# PyTorch adopts [outputC, inputC, kH, kW]
weights_torch = torch.Tensor(weights.permute((3, 2, 0, 1)))
# Tensorflow defaults to NHWC
x_tf = tf.convert_to_tensor(x.transpose((0, 2, 3, 1)), dtype=tf.float32)
x_torch = torch.Tensor(x)
# TF Conv2D
tf_conv2d = tf.nn.conv2d(x_tf,
weights_tf,
strides=[1, 1, 1, 1],
padding="SAME")
# PyTorch Conv2D
torch_conv2d = F.conv2d(x_torch, weights_torch, padding=1, stride=1)
sess.run(tf.global_variables_initializer())
tf_result = sess.run(tf_conv2d)
diff = np.mean(np.abs(tf_result.transpose((0, 3, 1, 2)) - torch_conv2d.detach().numpy()))
print('Mean of Abs Diff: {0}'.format(diff))
The result of this execution is:
Mean of Abs Diff: 2.0443112092038973e-08
When I change stride to 2, the results start to vary.
# TF Conv2D
tf_conv2d = tf.nn.conv2d(x_tf,
weights_tf,
strides=[1, 2, 2, 1],
padding="SAME")
# PyTorch Conv2D
torch_conv2d = F.conv2d(x_torch, weights_torch, padding=1, stride=2)
The result of this execution is:
Mean of Abs Diff: 0.2104552686214447
According to PyTorch documentation, conv2d uses zero-padding defined by the padding argument. Thus, zeros are added to the left, top, right, and bottom of the input in my example.
If PyTorch simply adds padding on both sides based on the input parameter, it should be easy to replicate in Tensorflow.
# Manually add padding - consistent with PyTorch
paddings = tf.constant([[0, 0], [1, 1], [1, 1], [0, 0]])
x_tf = tf.convert_to_tensor(x.transpose((0, 2, 3, 1)), dtype=tf.float32)
x_tf = tf.pad(x_tf, paddings, "CONSTANT")
# TF Conv2D
tf_conv2d = tf.nn.conv2d(x_tf,
weights_tf,
strides=[1, 2, 2, 1],
padding="VALID")
The result of this comparison is:
Mean of Abs Diff: 1.6035047067930464e-08
What this tells me is that if I am somehow able to replicate the default padding behavior from Tensorflow into PyTorch, then my results will be similar.
This question inspected the behavior of padding in Tensorflow. TF documentation explains how padding is added for "SAME" convolutions. I discovered these links while writing this question.
Now that I know the padding strategy of Tensorflow, I can implement it in PyTorch.
To replicate the behavior, padding sizes are calculated as described in the Tensorflow documentation. Here, I test the padding behavior by setting stride=2 and padding the PyTorch input.
import tensorflow as tf
import numpy as np
import torch
import torch.nn.functional as F
np.random.seed(0)
sess = tf.Session()
# Create random weights and input
weights = torch.empty(3, 3, 3, 8)
torch.nn.init.constant_(weights, 5e-2)
x = np.random.randn(1, 3, 10, 10)
weights_tf = tf.convert_to_tensor(weights.numpy(), dtype=tf.float32)
weights_torch = torch.Tensor(weights.permute((3, 2, 0, 1)))
# Tensorflow padding behavior. Assuming that kH == kW to keep this simple.
stride = 2
if x.shape[2] % stride == 0:
pad = max(weights.shape[0] - stride, 0)
else:
pad = max(weights.shape[0] - (x.shape[2] % stride), 0)
if pad % 2 == 0:
pad_val = pad // 2
padding = (pad_val, pad_val, pad_val, pad_val)
else:
pad_val_start = pad // 2
pad_val_end = pad - pad_val_start
padding = (pad_val_start, pad_val_end, pad_val_start, pad_val_end)
x_tf = tf.convert_to_tensor(x.transpose((0, 2, 3, 1)), dtype=tf.float32)
x_torch = torch.Tensor(x)
x_torch = F.pad(x_torch, padding, "constant", 0)
# TF Conv2D
tf_conv2d = tf.nn.conv2d(x_tf,
weights_tf,
strides=[1, stride, stride, 1],
padding="SAME")
# PyTorch Conv2D
torch_conv2d = F.conv2d(x_torch, weights_torch, padding=0, stride=stride)
sess.run(tf.global_variables_initializer())
tf_result = sess.run(tf_conv2d)
diff = np.mean(np.abs(tf_result.transpose((0, 3, 1, 2)) - torch_conv2d.detach().numpy()))
print('Mean of Abs Diff: {0}'.format(diff))
The output is:
Mean of Abs Diff: 2.2477470551507395e-08
I wasn't quite sure why this was happening when I started writing this question, but a bit of reading clarified this very quickly. I hope this example can help others.
I'm using Keras with TF backend. Recently, when using the functional API to make "hybrid" models, it seemed to me that Keras requires me to feed values that it shouldn't need.
As a background, I am trying to implement a conditional GAN in Keras. My implementation has a generator and a discriminator. As an example, the generator accepts (20, 20, 1) inputs and returns (20, 20, 1) outputs. These are stacked by channel to produce a (20, 20, 2) input to the discriminator. The discriminator is supposed to decide whether it is seeing a ground-truth translation of the original (20, 20, 1) image or a translation by the generator. This is represented by 0=fake, 1=real.
By itself, the discriminator is just a CNN for binary classification. Therefore, it can be trained by feeding data points with inputs of shape (20, 20, 2) and outputs in {0,1}. Therefore, if I write something like:
# <disc> is the discriminator
arbitrary_input = np.full(shape=(5, 20, 20, 2), fill_value=0.5)
arbitrary_labels = np.array([1, 1, 0, 0, 1])
disc.fit(arbitrary_input, arbitrary_labels, epochs=5)
training will proceed without errors (obviously this is a useless dataset, though).
However, when I insert the discriminator into the generator-discriminator stack:
# <disc> is the discriminator, <gen> is the generator
input = Input(shape=(20, 20, 1), name='stack_input')
gen_output = gen(input)
pair = Concatenate(axis=FEATURES_AXIS)([input, gen_output])
disc_output = disc(gen_output)
stack = Model(input, disc_output)
stack.compile(optimizer='adam', loss='binary_crossentropy')
arbitrary_input = np.full(shape=(5, 20, 20, 2), fill_value=0.5)
arbitrary_labels = np.array([1, 1, 0, 0, 1])
disc.fit(arbitrary_input, arbitrary_labels, epochs=5)
suddenly I need to feed an extra placeholder. I get this error message on disc.fit():
InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'stack_input' with dtype float
[[Node: stack_input = Placeholder[dtype=DT_FLOAT, shape=[], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
As you can see by the name, this is the input to the hybrid/stacked model. I haven't changed the discriminator at all, I have only included it in another model. Therefore disc.fit() should still work, right?
There's a workaround available by freezing the weights of the generator and using fit() on the full stack, I think, but I do not understand why the method above doesn't work.
Is it perhaps some issue with scoping?
Edit: The discriminator is really just a simple CNN. It is initialized with disc = pix2pix_discriminator(input_shape=(20, 20, 2), n_filters=(32, 64)). The function in question is:
def pix2pix_discriminator(input_shape, n_filters, kernel_size=4, strides=2, padding='same', alpha=0.2):
x = Input(shape=input_shape, name='disc_input')
# first layer
h = Conv2D(filters=n_filters[0],
kernel_size=kernel_size,
strides=strides,
padding=padding,
data_format=DATA_FORMAT)(x)
# no BatchNorm
h = LeakyReLU(alpha=alpha)(h)
for i in range(1, len(n_filters)):
h = Conv2D(filters=n_filters[i],
kernel_size=kernel_size,
strides=strides,
padding=padding,
data_format=DATA_FORMAT)(h)
h = BatchNorm(axis=FEATURES_AXIS)(h)
h = LeakyReLU(alpha=alpha)(h)
h_flatten = Flatten()(h) # required for the upcoming Dense layer
y_pred = Dense(units=1, activation='sigmoid')(h_flatten) # binary output
discriminator = Model(inputs=x, outputs=y_pred)
discriminator.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
return discriminator