Transform 2D embeddings into 3D embeddings - python

I want to fuse node embeddings from a GNN shaped {[N, 128], where N can vary, with hidden state from an LSTM shaped [2, 32, 128]. To do so, I wanted to add both feature tensors, but they need to be the same shape.
I have something like:
N = 75
t = torch.rand(N, 128)
t.unsqueeze_(0)
t = t.expand(2,N, 128)
I do not seem to figure out how to make dimension 1 equal to 32. I am looking for some combination of transforms and learnable layers that can help me make this shape conversion. Thanks in advance for any help you can give.

Related

LSTM input and output dimensions in Keras

I am confused about LSTM input/output dimensions, specifically in keras library. How do keras return 2D output while its input is 3D? I know it can return 3D output using “return_sequence = Trure,” but if return_sequence = False, how can it deal with 3D and produces 2D output?
For example, if input data of shape (32, 16, 20), 32 batch size, 16 timestep, 20 features, and output of shape (32, 100), 32 batch size, 100 hidden states; how keras processes input of 3d and returns output 2d.
Additionally, how can concatenate input and hidden state if they don’t have the exact dimensions?
I found the answer to my question in the link below:
https://mmuratarat.github.io/2019-01-19/dimensions-of-lstm
it's very helpful!

Difference between the input shape for a 1D CNN, 2D CNN and 3D CNN

I'm first time building a CNN model for image classification and i'm a little bit confused about what would be the input shape for each type (1D CNN, 2D CNN, 3D CNN) and how to fix the number of filters in the convolution layer. My data is 100x100x30 where 30 are features.
Here is my essay for the 1D CNN using the Functional API Keras:
def create_CNN1D_model(pool_type='max',conv_activation='relu'):
input_layer = (30,1)
conv_layer1 = Conv1D(filters=16, kernel_size=3, activation=conv_activation)(input_layer)
max_pooling_layer1 = MaxPooling1D(pool_size=2)(conv_layer1)
conv_layer2 = Conv1D(filters=32, kernel_size=3, activation=conv_activation)(max_pooling_layer1)
max_pooling_layer2 = MaxPooling1D(pool_size=2)(conv_layer2)
flatten_layer = Flatten()(max_pooling_layer2)
dense_layer = Dense(units=64, activation='relu')(flatten_layer)
output_layer = Dense(units=10, activation='softmax')(dense_layer)
CNN_model = Model(inputs=input_layer, outputs=output_layer)
return CNN_model
CNN1D = create_CNN1D_model()
CNN1D.compile(loss = 'categorical_crossentropy', optimizer = "adam",metrics = ['accuracy'])
Trace = CNN1D.fit(X, y, epochs=50, batch_size=100)
However, while trying the 2D CNN model by just changing Conv1D, Maxpooling1D to Conv2D and Maxpooling2D, i got the following error :
ValueError: Input 0 of layer conv2d_1 is incompatible with the layer: : expected min_ndim=4, found ndim=3. Full shape received: (None, 30, 1)
Can anyone please tell me how would be the input shape for 2D CNN and 3D CNN ? And what can be done on input data preprocessing?
TLDR; your X_train can be looked at as (batch, spatial dims..., channels). A kernel applies to the spatial dimensions for all channels in parallel. So a 2D CNN, would require two spatial dimensions (batch, dim 1, dim 2, channels).
So for (100,100,3) shaped images, you will need a 2D CNN that convolves over 100 height and 100 width, over all the 3 channels.
Lets, understand the above statement.
First, you need to understand what CNN (in general) is doing.
A kernel is convolving through the spatial dimensions of a tensor across its feature maps/channels while performing a simple matrix operation (like dot product) to the corresponding values.
Kernel moves over the spatial dimensions
Now, Let's say you have 100 images (called batches). Each image is 28 by 28 pixels and has 3 channels R, G, B (which are also called feature maps in context to CNNs). If I were to store this data as a tensor, the shape would be (100,28,28,3).
However, I could just have an image that doesn't have any height (may like a signal) OR, I could have data that has an extra spatial dimension such as a video (height, width, and time).
In general, here is how the input for a CNN-based neural network looks like.
Same kernel, all channels
The second key point you need to know is, A 2D kernel will convolve over 2 spatial dimensions BUT the same kernel will do this over all the feature maps/channels. So, if I have a (3,3) kernel. This same kernel will get applied over R, G, B channels (in parallel) and move over the Height and Width of the image.
Operation is a dot product
Finally, the operation (for a single feature map/channel and single convolution window) can be visualized like below.
Therefore, in short -
A kernel gets applied to the spatial dimensions of the data
A kernel shape is equal to the # of spatial dimensions
A kernel applies over all the feature maps/channels at once
The operation is a simple dot product between the kernel and window
Let's take the example of tensors with single feature maps/channels (so, for an image, it would be greyscaled) -
So, with that intuition, we see that if I want to use a 1D CNN, your data must have 1 spatial dimension, which means each sample needs to be 2D (spatial dimension and channels), which means the X_train must be a 3D tensor (batch, spatial dimensions, channels).
Similarly, for a 2D CNN, you would have 2 spatial dimensions (H, W for example) and would be 3D samples (H, W, Channels) and X_train would be (Samples, H, W, Channels)
Let's try this with code -
import tensorflow as tf
from tensorflow.keras import layers
X_2D = tf.random.normal((100,7,3)) #Samples, width/time, channels (feature maps)
X_3D = tf.random.normal((100,5,7,3)) #Samples, height, width, channels (feature maps)
X_4D = tf.random.normal((100,6,6,2,3)) #Samples, height, width, time, channels (feature maps)
For applying 1D CNN -
#With padding = same, the edge pixels are padded to not skip a few
#Out featuremaps = 10, kernel (3,)
cnn1d = layers.Conv1D(10, 3, padding='same')(X_2D)
print(X_2D.shape,'->',cnn1d.shape)
#(100, 7, 3) -> (100, 7, 10)
For applying 2D CNN -
#Out featuremaps = 10, kernel (3,3)
cnn2d = layers.Conv2D(10, (3,3), padding='same')(X_3D)
print(X_3D.shape,'->',cnn2d.shape)
#(100, 5, 7, 3) -> (100, 5, 7, 10)
For 3D CNN -
#Out featuremaps = 10, kernel (3,3)
cnn3d = layers.Conv3D(10, (3,3,2), padding='same')(X_4D)
print(X_4D.shape,'->',cnn3d.shape)
#(100, 6, 6, 2, 3) -> (100, 6, 6, 2, 10)
By a 100x100x30 input shape, are you saying the batch size is 100? Or is each data in a shape of 100x100x30? In the second case, you must use a Conv2D layer instead. Input shapes of each layer are supposed to be:
Conv1D: (size1, channel_number), Conv2D: (size1, size2, channel_number) , Conv3D: (size1, size2, size3, channel_number)
The 1DCNN, 2DCNN, 3DCNN denotes the dimension of each kernel and channel of the convolution layer.

Can 2D convolutional neural network be converted into 1D convolutional neural network?

I have designed a neural network using 2d convolutional layers and max-pooling layers with the input shape for input, one hot encoded sequences as 2d array. then it is reshaped before inputting the model.
data = np.zeros( (100, 21 * 1000), dtype=np.float32 )
#reshape
x_data = tf.reshape( data, [-1, 1, 1000, 21] )
However, I used the same dataset using 1D convolutional layers by changing the model and input array without reshaping as it is 1D
data = np.zeros( (100, 1000,21), dtype=np.float32 )
finally, the 1D convolutional model performed well with 96% act. and 2d CNN gave 93%. Can someone explain to me what actually happens there to increase the accuracy?
Can someone explain to me what actually happens there to increase the accuracy?
That's hard to tell and depends on your specific dataset, network, hyperparameters etc.
Generally, in a conv2D-Layer the filter shifts horizontal and vertical. In a conv1D-Layer the filter shifts only vertical in the convolution process.
So which one is the best? That depends on your problem. For time series conv1D could be better and for images conv2D could be the better choice.

Understanding the output shape of conv2d layer in keras

I do not understand why the channel dimension is not included in the output dimension of a conv2D layer in Keras.
I have the following model
def create_model():
image = Input(shape=(128,128,3))
x = Conv2D(24, kernel_size=(8,8), strides=(2,2), activation='relu', name='conv_1')(image)
x = Conv2D(24, kernel_size=(8,8), strides=(2,2), activation='relu', name='conv_2')(x)
x = Conv2D(24, kernel_size=(8,8), strides=(2,2), activation='relu', name='conv_3')(x)
flatten = Flatten(name='flatten')(x)
output = Dense(1, activation='relu', name='output')(flatten)
model = Model(input=image, output=output)
return model
model = create_model()
model.summary()
The model summary is given the figure at the end of my question. The input layer takes RGB images with width = 128 and height = 128. The first conv2D layer tells me the output dimension is (None, 61, 61, 24). I have used the kernel size of (8, 8), a stride of (2, 2) no padding. The values 61 = floor( (128 - 8 + 2 * 0)/2 + 1) and 24 (number of kernels/filters) makes sense. But why isn't the dimension for the different channels included in the dimension? As far as I can see the parameters for the 24 filters on each of the channels is included in the number of the parameters. So I would expect the output dimension to be (None, 61, 61, 24, 3) or (None, 61, 61, 24 * 3). Is this just a strange notation in Keras or am I confused about something else?
This question is asked in various forms all over the internet and has a simple answer which is often missed or confused:
SIMPLE ANSWER:
The Keras Conv2D layer, given a multi-channel input (e.g. a color image), will apply the filter across ALL the color channels and sum the results, producing the equivalent of a monochrome convolved output image.
An example, from a CIFAR-10 CNN example:
(1) You're training with the CIFAR image dataset, which is made up of 32x32 color images, i.e. each image is shape (32,32,3) (RGB = 3 channels)
(2) Your first layer of your network is a Conv2D Layer with 32 filters, each specified as 3x3, so:
Conv2D(32, (3,3), padding='same', input_shape=(32,32,3))
(3) Counter-intuitively, Keras will configure each filter as (3,3,3), i.e. a 3D volume covering the 3x3 pixels PLUS all the color channels. As a minor detail each filter has an additional weight for a BIAS value, as per normal neural network layer arithmetic.
(4) Convolution proceeds absolutely as normal, except a 3x3x3 VOLUME from the input image is convolved at each step with the 3x3x3 filter, and a single (monochrome) output value (i.e. like a pixel) is produced at each step.
(5) The result is a Keras Conv2D convolution of a specified (3,3) filter on a (32,32,3) image produces a (32,32) result because the actual filter used is (3,3,3).
(6) In this example, we have also specified 32 filters in the Conv2D layer, so the actual output is (32,32,32) for each input image (i.e. you might think of this as 32 images, one for each filter, each 32x32 monochrome pixels).
As a check, you can look at the count of weights (Param #) for the layer produced by model.summary():
Layer (type) Output shape Param#
conv2d_1 (Conv2D) (None, 32, 32, 32) 896
There are 32 filters, each 3x3x3 (i.e. 27 weights) plus 1 for the bias (i.e. total 28 weights each). And 32 filters x 28 weights each = 896 Parameters.
Each of the convolutional filters (8 x 8) is connected to a (8 x 8) receptive field for all the channels of the image. That is why we have (61, 61, 24) as the output of the second layer. The different channels are encoded implicitly into the weights of the 24 filters. This means, that each filter does not have 8 x 8 = 64 weights but instead 8 x 8 x Number of channels = 8 x 8 x 3 = 192 weights.
See this quote from CS231
Left: An example input volume in red (e.g. a 32x32x3 CIFAR-10 image),
and an example volume of neurons in the first Convolutional layer.
Each neuron in the convolutional layer is connected only to a local
region in the input volume spatially, but to the full depth (i.e. all
color channels). Note, there are multiple neurons (5 in this example)
along the depth, all looking at the same region in the input - see
discussion of depth columns in the text below. Right: The neurons from the
Neural Network chapter remains unchanged: They still compute a dot
product of their weights with the input followed by a non-linearity,
but their connectivity is now restricted to be local spatially.
My guess is that you're misunderstanding how convolutional layers defined.
My notation for the shape of the convolutional layer is (out_channels, in_channels, k, k) where k is a the size of the kernel. The out_channels is the number of filters (i.e. convolutional neurons). Consider following image:
The 3d convolutional kernel weights in the picture slide across different data windows of A_{i-1}(i.e. input image). Patches of 3D data of that image of shape (in_channels, k, k) are paired with individual 3d convolutional kernels of matching dimensionality. How many such 3d kernels are there? As the number of output channels out_channels. The depth dimension that kernel adopts is the in_channels of A_{i-1}. Therefore, the dimension in_channels of A_{i-1} is contracted away by the depth-wise dot product that builds up the output tensor with out_channels channels. The precise way in which the sliding windows are constructed is defined by the sampling tuple (kernel_size, stride, padding) and results in output tensor with spatial dimensions determined by the formula that you're correctly applied.
If you want to understand more, including backpropagation and implementation take a look at this paper.
The formula you're using is correct. It may be little confusing because many popular tutorial use number of filters equal to number of channels in the image. TensorFlow/Keras implementation produces its output by computing num_input_channels * num_output_channels intermediate feature maps of size (kernel_size[0], kernel_size[1]). So for each input channel it produces num_output_channels feature maps which then get multiplied and concatenated together to create output shape of (kernel_size[0], kernel_size[1], num_output_channels) Hope this clarifies Vlad's detailed answer

Python numpy concatenate 4D

I'm trying to concatenate 2 numpy arrays of features predicted by the convolution layers in a vgg16 model.
Basically i have used the bottom layers of a vgg16 model to predict the features for my full dataset and now I want to load the parts of dataset dynamically based on some settings, to train some models with it.
So, I have 2 array of shape:
(724, 512, 6, 8) and (3376, 512, 6, 8)
Basically the first one contains features predicted from 724 image files (each prediction has shape (512, 6, 8)).
I want to concatenate these 2 arrays into one of shape (4100, 512, 6, 8)
I have tried using:
np.array([np.concatenate(arr, axis=0) for arr in false_train_list])
where false_train_list is the list containing the 2 arrays with the above shapes.
Also tried with np.stack, tf.stack...
All of these result in an array with shape (2,)
Can someone explain why ? I haven't found any good resources to understand how exactly np.concatenate() works..
Thank you!
I think you simply need this instead:
np.concatenate(false_train_list, axis=0)
https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.concatenate.html

Categories

Resources