I have images with shape (100, 100, 3), and I want to use keras 1D convolution to classify the images.
I want to know if this is possible, and what is the shape of the input I need to use.
PS: I use tf.data.Dataset, and my dataset is batched (20, 100, 100, 3).
I assume you mean 1x1 convolutions which convolve images across layers. In your case the layer code would be:
tf.keras.layers.Conv2D(filters=NUM_FILTERS, kernel_size=1, strides=1)
Conv1D is indeed for 1d data processing (like sound) as #MatusDubrava pointed out.
Should we use 1D convolution for image classification?
TLDR; Not by itself, but maybe if composed.
The correlation between pixels in an image (be it 2D or 3D due to multiple channels) is of spatial nature: the value of a given pixel is highly influenced by the neighboring pixels both vertically and horizontally. The advantage of 2D/3D Convolution (Conv2D or Conv3D) is that they manage to capture this influence in both spatial directions: vertical and horizontal.
In comparison, 1D convolution or Conv1D is only capturing one of the two correlations (either vertical or horizontal), thus yielding much more limited information. By itself, a singe Conv1D will be leaving out substantial information.
Nonetheless, since a Conv2D could be 'decomposed' into two Conv1D blocks (this is similar to the Pointwise & Depthwise convolutions in the MobileNet architecture), concatenating a Vertical Conv1D and a Horizontal Conv1D captures the spatial correlation in both axes. This is valid approach towards image classification as an alternative to Conv2D.
Can we use 1D convolution for image classification? How?
Yes, we can.
You should not reshape the data to reduce dimensions: if you do, you would be taping together one end of the image (say the top if the Conv1D is applied vertically) with the other end (the say the bottom), which breaks spatial coherence.
This is a possible example on how (implementing the concatenation explained above):
import tensorflow as tf
x = tf.random.normal(input_shape = (20, 100, 100, 3)) # your input batch
# Horizontal Conv1D
y_h = tf.keras.layers.Conv1D(
filters=32, kernel_size=3, activation='relu', input_shape=x.shape[2:])(x)
# Vertical Conv1D
y_v = tf.transpose(x, perm=[0, 2, 1, 3]) # Image rows to columns
y_v = tf.keras.layers.Conv1D(
filters=32, kernel_size=3, activation='relu', input_shape=x.shape[2:])(x)
# y_v = tf.transpose(y_v, perm=[0, 2, 1, 3]) # Undo transpose, optional
# Concatenate results
y = tf.keras.layers.Concatenate(axis=3)([y_h, y_v]) # Concatenate on the feature_maps
Note that you require multiple operations to obtain a result (convolution over vertical and horizontal axes) which would be easier and faster to get by applying Conv2D directly.
When should we use this?
If your image data is particularly uninformative in one axis while being particularly interesting in the other spatial axis, it might be an idea worth exploring. Otherwise it is better to resort to standard Conv2D (Most of the cases out there, including almost all public image datasets).
Related
I want to predict a spatio-temporal system with dimension (x,y,t) using a convolutional neural network. My time-dimension is magnitudes smaller than both space dimensions, i.e. t=16, x=y=1024.
I a currently using 3D convolutional layers with channel dimension 1. Say I choose my kernel size for all dimensions 5, I end up with a kernel of size (5, 5, 5, 1). Since 3D convolutions are computationally expensive, the model training tends to crash a lot. One workaround I thought of is to use 2D convolutional layers and treat the time dimension as channels. My resulting kernel would be of size (5, 5, 16) with t=16 time steps.
In summary, by using 2D convolutions I get rid of one dimension to stride along. On the other hand, the 2D layer has more weights / trainable parameters compared to the 3D layer; 2D: 5x5x16 = 400 vs. 3D: 5x5x5x1 = 125 (+ bias, respectively). How does this translate to the computational cost between Conv3D and Conv2D layers?
I have designed a neural network using 2d convolutional layers and max-pooling layers with the input shape for input, one hot encoded sequences as 2d array. then it is reshaped before inputting the model.
data = np.zeros( (100, 21 * 1000), dtype=np.float32 )
#reshape
x_data = tf.reshape( data, [-1, 1, 1000, 21] )
However, I used the same dataset using 1D convolutional layers by changing the model and input array without reshaping as it is 1D
data = np.zeros( (100, 1000,21), dtype=np.float32 )
finally, the 1D convolutional model performed well with 96% act. and 2d CNN gave 93%. Can someone explain to me what actually happens there to increase the accuracy?
Can someone explain to me what actually happens there to increase the accuracy?
That's hard to tell and depends on your specific dataset, network, hyperparameters etc.
Generally, in a conv2D-Layer the filter shifts horizontal and vertical. In a conv1D-Layer the filter shifts only vertical in the convolution process.
So which one is the best? That depends on your problem. For time series conv1D could be better and for images conv2D could be the better choice.
In my problem, I want to convolve two tensors in my neural network model.
The shape of two tensors is [None, 2, 1], [None, 3, 1] respectively. The axis with dimension None means the batch size of the input tensor. For each sample in batch, I want to convolve the two tensors with shape [2, 1] and [3, 1].
However, the tf.nn.conv1d in TensorFlow can only convolve the input with a fixed kernel. Is there any function that can support the convolution of two tensors according to the batch size axis, similar to the tf.multiply which can multiply two tensors for each sample or just elementwise multiplication.
The code I ran can be simplified as follows:
input_signal = Input(shape=(L, M), name='input_signal')
input_h = Input(shape=(N), name='input_h')
faded= Lambda(lambda x: tf.nn.conv1d(input, x))(input_h)
What I want to do is that the sample of input_signal can be convolved by the sample of input_h with the same index. However, it just shows my pure idea which can not be able to run in the env. My question is that how I can modify the code to enable the input tensor can be convolved with another input tensor for every sample in the batch.
According to the description of the kernel size arguments for Conv1D layer or any other layer mentioned in the documentation, you cannot add multiple filters with different Kernel size or strides.
Also, Convolutions with Kernels of different sizes will produce outputs of different height and width.
The general formula for output size assuming a symmetric kernel is given by
(X−K+2P)/S+1
Where X is the input Height / Width
K is the Kernel size
P is the zero-padding
S is the stride length
So assuming you are keeping zero paddings and stride same you cannot have multiple kernels with different sizes in ConvD layer.
You can, however, use the tf.keras.Model API to create Conv1D multiple times on the same input OR multiple Conv1D Layer for different inputs and kernel size respectively in your case and then either maxpool, crop or use zero paddings to match the dimensions of the different outputs before stacking them.
Example:
inputs = tf.keras.Input(shape=(n_timesteps,n_features))
x1 = tf.keras.layers.Conv1D(filters=32, kernel_size=2)(inputs)
x2 = tf.keras.layers.Conv1D(filters=16, kernel_size=3)(inputs)
#match dimensions (height and width) of x1 or x2 here
x3 = tf.keras.layers.Concatenate(axis=-1)[x1,x2]
You can use either Zeropadding1D or Cropping2D or Maxpool1D for matching the dimensions.
So far, I've been practicing neural networks on numerical datasets in pandas, but now I need to create a model that will take an image as input and output a binary mask of that image.
I have my training data as numpy arrays of shape (602, 2048, 2048, 1). 602 images of dimensions 2048x2048 with one channel. The array of output masks have the same dimensions.
What I can't figure out is how to define the first layer or how to correctly feed the data into the model. I would greatly appreciate your help on this issue
Well, this is not a "rule", but probably you will be using mostly 2D conv and related layers.
You feed everything as numpy arrays, as usual, maybe normalizing the values. Common options are:
Between 0 and 1 (just divide by 255.)
Between -1 and 1 (divide by 255., multiply by 2, subtract 1)
Caffe style: subtract from each channel a specific value to "center" the values based on their usual mean without rescaling them.
Your model should start with something like:
inputTensor = Input((2048,2048,1))
output = Conv2D(filters, kernel_size, .....)(inputTensor)
Or, in sequential models: model.add(Conv2D(...., input_shape=(2048,2048,1))
Later, it's up to you to decide which layers to use.
Conv2D
MaxPooling2D
Upsampling2D
Whether you're going to create a linear model or if you're going to divide branches, join branches, etc. is also your call.
Models in a U-Net style should be a good start for you.
What you can't do:
Don't use Flatten layers (actually you can, if you later reshape the output for having image dimensions... but why?)
Don't use Global Pooling layers (you don't want to sacrifice your spatial dimensions)
As this guide said [A guide to convolution arithmetic for deep learning], a deconvolutional layer can be transformed into an equivalent convolutional layer.
However, when the original convolution has a stride larger than one, the corresponded equivalent convolution of deconvolution should take a stretched input obtained by adding s−1 zeros between each input unit, where s is the stride in the original convolution.
Here is an example:
[The transpose of convolving a 3×3 kernel over a 5×5 input padded with a 1×1 border of zeros using 2×2 strides]
Here is the problem: because tensorflow only provides a 2-D version deconvolutional layer, if I want to implement a 1-D deconvolutional layer for an original convolutional layer with a stride larger than one, how can I add zeros between each input unit?
Thanks very much
I just found that the convolutional layer in keras has a parameter called dilation_rate, and it can cover my requirement.