I am a using CLIP model. Where I have two models. One model output is (20, 128, 256) and the other one output is (20, 256).
image_model_output = (20, 256)
text_model_output = (20, 128, 256)
I use the following to calculate this
logits = (tf.matmul(caption_embeddings, image_embeddings, transpose_b=True))
so it will be like `(20, 256) * (256, 128, 20)`
it's ouput will be `(20, 128, 20)`
Similarly I calculate like this
images_similarity = tf.matmul(
image_embeddings, image_embeddings, transpose_b=True
)
(Output)--> (20, 256) * (256, 20) = (20,20)
and this
captions_similarity = tf.matmul(
caption_embeddings, caption_embeddings, transpose_b=True
)
(Output)--> (20, 128, 256) * (256, 128, 20) = (20, 128, 128)
The problem arises here
targets = keras.activations.softmax(
(captions_similarity + images_similarity) / (2 * self.temperature)
)
So do I need to change the activation function or there is any way to add these 3d matrices with different shapes?
Sorry to technically explain like this but people with solid deep learning and machine learning backgorund will understand.
NOTE: After adding axis 1 like this tf.expand_dims(image_embeddings, axis=1) the below part runs successfully
targets = keras.activations.softmax(
(captions_similarity + images_similarity) / (2 * self.temperature)
)
However after this there is a loss funtion like below
captions_loss = keras.losses.categorical_crossentropy(
y_true=targets, y_pred=logits, from_logits=True
)
which generates this error
ValueError: Shapes (2, 128, 128) and (2, 128, 1) are incompatible
Is it possible to solve this error?
To handle the above error I used a different loss funtion. I changed the code like below.
captions_loss = keras.losses.categorical_crossentropy(
y_true=targets, y_pred=logits, from_logits=True
)
To
captions_loss = keras.losses.kl_divergence(
y_true=targets, y_pred=logits
)
To save time of developers I have answered to my own. I am available to discuss on it further if someone is interested.
Related
I am trying to build an Inception model as described here:
https://towardsdatascience.com/deep-learning-for-time-series-classification-inceptiontime-245703f422db
It all works so far, but when I try to implement the shortcut layer and add the two tensors together I get an Error.
Here is my shortcut code:
def shortcut_layer(inputs,z_interception):
print(inputs.shape)
inputs = keras.layers.Conv1D(filters=int(z_interception.shape[-1]),kernel_size=1,padding='same',use_bias=False)(inputs)
print(z_interception.shape[-1])
print(inputs.shape,z_interception.shape)
inputs = keras.layers.BatchNormalization()(inputs)
z = keras.layers.Add()([inputs,z_interception])
print('zshape: ',z.shape)
return keras.layers.Activation('relu')(z)
The output is as follows:
(None, 160, 8)
128
(None, 160, 128) (None, 160, 128)
The output is exactly as I expect it to be, but I still get the error:
ValueError: Operands could not be broadcast together with shapes (160, 128) (160, 8)
which doesn't make sense to me as I try to add the two tensors with shape: (None, 160, 128)
I hope someone can help me with this. Thank you in advance.
I was trying to train my model for prediction of EMNIST by using Pytorch.
Edit:- Here's the link of colab notebook for the problem.
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(28, 64, (5, 5), padding=2)
self.conv1_bn = nn.BatchNorm2d(64)
self.conv2 = nn.Conv2d(64, 128, 2, padding=2)
self.fc1 = nn.Linear(2048, 1024)
self.dropout = nn.Dropout(0.3)
self.fc2 = nn.Linear(1024, 512)
self.bn = nn.BatchNorm1d(1)
self.fc3 = nn.Linear(512, 128)
self.fc4 = nn.Linear(128, 47)
def forward(self, x):
x = F.relu(self.conv1(x))
x = F.max_pool2d(x, 2, 2)
x = self.conv1_bn(x)
x = F.relu(self.conv2(x))
x = F.max_pool2d(x, 2, 2)
x = x.view(-1, 2048)
x = F.relu(self.fc1(x))
x = self.dropout(x)
x = self.fc2(x)
x = x.view(-1, 1, 512)
x = self.bn(x)
x = x.view(-1, 512)
x = self.fc3(x)
x = self.fc4(x)
return F.log_softmax(x, dim=1)
return x
I am getting this type of error as shown below, whenever I am training my model.
<ipython-input-11-07c68cf1cac2> in forward(self, x)
24 def forward(self, x):
25 x = F.relu(self.conv1(x))
---> 26 x = F.max_pool2d(x, 2, 2)
27 x = self.conv1_bn(x)
RuntimeError: Given input size: (64x28x1). Calculated output size: (64x14x0). Output size is too small
I tried to searched for the solutions and found that I should transform the data before. So i tried transforming it by the most common suggestion:-
transform_valid = transforms.Compose(
[
transforms.ToTensor(),
])
But then again I am getting the error mentioned below. Maybe the problem lies here in the transformation part.
/opt/conda/lib/python3.7/site-packages/torchvision/datasets/mnist.py:469: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /opt/conda/conda-bld/pytorch_1595629403081/work/torch/csrc/utils/tensor_numpy.cpp:141.)
return torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)
I wanted to make that particular numpy array writable by using "ndarray.setflags(write=None, align=None, uic=None)" but I'm not able to figure out from where and what type of array should I make writable, as I'm directly loading the dataset using ->
"datasets.EMNIST(root, split="balanced", train=False, download=True, transform=transform_valid)"
welcome to Stackoverflow !
Your problem is not related to the toTensor transform, this error is yielded because of the dimension of the tensor you input in your maxpool : the error clearly states that you are trying to maxppol a tensor of which one of the dimensions is 1 (64, 28, 1) and thus it will output a tensor with a dimension of 0 (64,14,0), which makes no sense.
You need to check the dimensions of the tensors you input in your model. They are definitely too small. Maybe you made a mistake with a view somewhere (hard to tell without a minimal reproducible example).
If I can try to guess, you have at the beginning a tensor size 28x28x1 (typical MNIST), and you put it into a convolution that expects a tensor of dims BxCxWxH (batch_size, channels, width, height), i.e something like (B, 1, 28, 28), but you confuse the width (28) from the input channels (nn.Conv2d(->28<-, 64, (5, 5), padding=2))
I believe you want your first layer to be nn.Conv2d(1, 64, (5, 5), padding=2), and you need to resize your tensors to give them the shape (B, 1, 28, 28) (the value of B is up to you) before giving them to the network.
Sidenote : the warning about writable numpy arrays is completely unrelated, it just means that pytorch will possibly override the "non-writable" data of your numpy array. If you don't care about this numpy array being modified, you can ignore the warning.
I intend to use the concepts of skip connection in my experiment. Basically, in my pipeline, features maps that comes after Conv2D are going to be stacked or concatenated. But, features maps are in different shape and try to stack them together into one tensor gave me error. Does anyone knows any possible way of doing this correctly in tensorflow? Any thoughts or ideas to make this happen? Thanks
idea flowchart
here is the pipeline flowchart I want to do it:
my case is little different because I got extra building block is used after Conv2D and its output now is feature maps of 15x15x64 and so on. I want to stack those features map into one then use it to Conv2D again.
my attempt:
this is my reproducible attempt:
import tensorflow as tf
from tensorflow.keras.layers import Dense, Dropout, Activation, Conv2D, Flatten, MaxPool2D, BatchNormalization
inputs = tf.keras.Input(shape=(32, 32, 3))
x = inputs
x = Conv2D(32, (3, 3), input_shape=(32,32,3))(x)
x = BatchNormalization(axis=-1)(x)
x = Activation('relu')(x)
fm1 = MaxPooling2D(pool_size=(2,2))(x)
x = Conv2D(32,(3, 3), input_shape=(15,15,32))(fm1)
x = BatchNormalization(axis=-1)(x)
x = Activation('relu')(x)
fm2 = MaxPooling2D(pool_size=(2,2))(x)
concatted = tf.keras.layers.Concatenate(axis=1)([fm1, fm2])
but this way I ended up with following error: ValueError: A Concatenate layer requires inputs with matching shapes except for the concat axis. Got inputs shapes: [(None, 15, 15, 32), (None, 6, 6, 32)]. I am not sure what would be correct way to stack features maps with different shape. How can we make this right? Any possible thoughts?
desired output
in my actual model, I got shape of features maps are TensorShape([None, 15, 15, 128]) and TensorShape([None, 6, 6, 128]). I need to find way to merge them or stack them into one. Ideally, shape of concatenated or stacked feature maps' shape would be: [None, 21,21,128]. Is there any way of stacking them into one? Any idea?
What you're trying to achieve doesn't work mathematically. Let me illustrate. Take the simple 1D problem (like 1D convolution). You have a (None, 64, 128) (fm1) sized output and a (None, 32, 128) (fm2) output that you want to concatenate. Then,
concatted = tf.keras.layers.Concatenate(axis=1)([fm1, fm2])
works totally fine, giving you an output of size (None, 96, 128).
Let's come to the 2D problem. Now you got two tensors (None, 15, 15, 128) and (None, 6, 6, 128) and want to end up with a (None, 21, 21, 128) sized output. Well the math doesn't work here. To understand why, reduce this to 1D format. Then you got
fm1 -> (None, 225, 128)
fm2 -> (None, 36, 128)
By concat you get,
concatted -> (None, 261, 128)
If the math works you should get (None, 441, 128) which is reshape-able to (None, 21, 21, 128). So this cannot be achieved unless you pad the edges of the smaller with 441-261 = 180 on the reshaped tensor. And then reshape it to the desired shape. Following is an example of how you can do it,
concatted = tf.keras.layers.Lambda(
lambda x: K.reshape(
K.concatenate(
[K.reshape(x[0], (-1, 225, 128)),
tf.pad(
K.reshape(x[1], (-1, 36, 128)), [(0,0), (0, 180), (0,0)]
)
], axis=1
), (-1, 21, 21, 128))
)([fm1, fm2])
Important: But I can't guaranttee the performance of your model this just solves your problem mathematically. In a machine learning perspective, I wouldn't advice this. Best way would be making sure the outputs are compatible in sizes for concatenation. Few ways would be,
Not reduce the size of convolution outputs (stride = 0 and padding='same')
Use transpose convolution operation to size-up the smaller one
I am trying to construct a model that looks like this.
Notice that the output shape of the padding layer is 1 * 48 * 48 * 32. The input shape to padding layer is 1 * 48 * 48 * 16. Which type of padding operation does that?
My code:
prelu3 = tf.keras.layers.PReLU(shared_axes = [1, 2])(add2)
deptconv3 = tf.keras.layers.DepthwiseConv2D(3, strides=(2, 2), padding='same')(prelu3)
conv4 = tf.keras.layers.Conv2D(32, 1, strides=(1, 1), padding='same')(deptconv3)
maxpool1 = tf.keras.layers.MaxPool2D()(prelu3)
pad1 = tf.keras.layers.ZeroPadding2D(padding=(1, 1))(maxpool1) # This is the padding layer where problem lies.
This is the part of code that is trying to replicate that block. However, I get model that looks like this.
Am I missing something here or am I using the wrong layer?
By default, keras maxpool2d takes in:
Input shape : 4D tensor with shape (batch_size, rows, cols, channels).
Output shape : (batch_size, padded_rows, padded_cols, chamels)
PLease have a look here zero_padding2d layer docs in keras.
In that respect you are trying to double what is getting treated as a channel here.
Your input looks more like (batch, x, y, z) and you want to have a (batch, x, y, 2*z)
Why do you want to have a zeropadding to double your z? I would rather suggest you to use a dense layer like
tf.keras.layers.Dense(32)(maxpool1)
That would increase z shape from 16 to 32.
Edited:
I got something which can help you.
tf.keras.layers.ZeroPadding2D(
padding=(0, 8), data_format="channels_first"
)(maxpool1)
What this does is treats your y, z as (x, y) and x as channel and pads (0, 8) around (y, z) to give (y, 32)
Demo:
import tensorflow as tf
input_shape = (4, 28, 28, 3)
x = tf.keras.layers.Input(shape=input_shape[1:])
y = tf.keras.layers.Conv2D(16, 3, activation='relu', dilation_rate=2, input_shape=input_shape[1:])(x)
x=tf.keras.layers.ZeroPadding2D(
padding=(0, 8), data_format="channels_first"
)(y)
print(y.shape, x.shape)
(None, 24, 24, 16) (None, 24, 24, 32)
As the title says I'm looking at determining the proper dimensions for my CNN architecture. First, I obtain the next element of my dataset:
train_ds = iter(model.train_dataset)
feature, label = next(train_ds)
Where feature has dimensions (32, 64, 64, 4) corresponding to a batch size of 32, height of 64, length 64, and extended batch size of 4 (not a channel dimension). I initialize my 4-d kernel to pass over my 3-matrix, as I do not want the extended batch size to be convoluted. What I mean by this is in practice I want a 2-d kernel of size (1, 1) to pass over each 64 x 64 image, and do the same for the extended batch size without convoluting the extended batch sizes together. So I am in fact doing a (1, 1) convolution for each image in parallel with each other. So far I was able to initialize the kernel and feed the conv2d like so:
kernel = tf.constant(np.ones((1, 1, 4, 4)), dtype=tf.float32)
output = tf.nn.conv2d(feature, kernel, strides=[1, 1, 1, 1], padding='SAME')
Doing this produces my expected output, (32, 64, 64, 4). But I have absolutely no idea how to initialize the weights so that they work with this architecture. I have something like this:
w_init = tf.random_normal_initializer()
input_dim = (4, 1, 1, 4)
w = tf.Variable(
initial_value=w_init(shape=(input_dim), dtype="float32"),
trainable=True)
tf.matmul(output, w)
But I'm receiving incompatible batch dimensions as I don't know what the input_dim should be. I know it should be something like (num_filters * filter_size * filter_size * num_channels) + num_filters according to this answer, but I'm pretty sure that doesn't work for my scenario.
After tinkering around I was able to come up with a solution when the dimension weights are of size (1, 1, 4, 4) or (num_filters * num_channels * filter_size * filter_size). If anyone wants to provide a mathematical or similar explanation, it would be much appreciated!