I have two tensors of shape N x D1 and M x D2 where D1 > D2, called X and Y respectively. For my task, X acts as the input and Y acts as the filter.
I want to calculate a matrix P of shape N x M x (D1-D2+1) such that:
P[0,0,0] = dot(X[0,0:D2], Y[0,:])
P[0,0,1] = dot(X[0,1:D2+1], Y[0,:])
P[N-1,M-1,D1-D2] = dot(X[N-1,D1-D2:D1], Y[M-1,:])
I can create a for loop and manually slide Y and calculate the dot products.
However I prefer using the correlation operator.
As I know, tensorflow has correlation operator implemented (https://www.tensorflow.org/versions/master/api_docs/python/nn/convolution) but I don't know how can I use my tensors as inputs and filters.
tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None, data_format=None, name=None)
In your case, I'd set strides to 1, and padding to SAME.
tf.nn.conv2d(X, Y, strides=1, padding=SAME)
Yes, you can use indeed tf.nn.conv2d(), but you should add both batch and channel dimensions:
X = tf.expand_dims(tf.expand_dims(X,0),-1)
# X.shape [batch=1, in_height, in_width, in_channels=1]
Y = tf.expand_dims(tf.expand_dims(Y,-1),-1)
# Y.shape = [filter_height, filter_width, in_channels=1, out_channels=1]
# Convolution (actually correlation, see doc of conv2d)
xcorr = tf.nn.conv2d(X, Y, padding="VALID", strides=[1, 1, 1, 1])
# Padding should be VALID, since you've already padded your input
CAVEAT: However, you cannot extrapolate this approach for batches of signals, since tf.nn.conv2d uses always the same filter over the batch dimension, and from my understanding you do want to change it.
Let us assume we have a tensor x with shape (64,100,5,32) which corresponds to (batchSize,Length,Height,Channels). Now I want to apply a 2D conv Layer on each 2D matrix of size (100,5) for each of the 32th channels. So I need to extract 32 slices and process them with the same 2D conv layer (parameters). I dont know how to start with lambda und map_fn (please not use time distributed layer). Finally, I want a tensor with size (64,100,5,32).
Thanks for a short code snipped how do this.
you can simply use a for loops with index slicing (without Lambda layer). here a dummy example:
n_sample = 3
H,W,C = 100,5,32
X = np.random.uniform(0,1, (n_sample,H,W,C))
inp = Input((H,W,C))
convs = []
conv = Conv2D(1, 3, padding='same') # this is always the same for all the slices
for c in range(inp.shape[-1]):
_x = tf.expand_dims(inp[:,:,:,c], -1)
convs = Concatenate()(convs)
model = Model(inp, convs)
model.compile('adam', 'mse')
model.fit(X,X, epochs=2)
I want to transform general tensors / vectors in Tensorflow, but to have a concrete example let's say rotate images.
For this I would like to have a rotation matrix R, which is learned by my network, i.e. there should be gradients computable.
How would you do this?
I found tf.contrib.image.transform, but for this it is said no gradients are computed into the transformation parameters.
Via py_func also the gradients are not available or would have to be calculated by hand - before writing a long custom solution for this (if even possible), are there maybe any ready-to-use solutions?
I cannot be the first one doing this.
For the requested code: I just want to feed an image as input, maybe apply some convolutional layer and in the end get a 2x2 matrix representing my transformation:
conv_1 = tf.layers.conv2d(conv1, 16, [3, 3], strides=(2, 2), padding='same', activation=tf.nn.leaky_relu)
M = tf.contrib.layers.fully_connected(conv_n, 4, activation_fn=tf.nn.tanh)
The matrix M then describes how my indices are transformed (imagining each pixel in the image as a vector with endpoint x, y), and I move each pixel then to its new location.
In numpy I could for example do this:
indices = []
for i in range(28):
for j in range(28):
indices.append([i, j])
indices = np.repeat(np.expand_dims(np.asarray(indices), 0), self.batch_size, 0)
transformed = []
for b in range(self.batch_size):
transformed.append(tf.matmul(indices[b], M[b]))
transformed = tf.stack(transformed)
transformed_img = np.zeros((self.batch_size, 28, 28))
for b in range(self.batch_size):
transformed_img[b, transformed[b, :, :, 0].astype(np.int32), transformed[b, :, :, 1].astype(np.int32)] = input_img[b, :, :, 0]
I'm trying to build a neural net however I can't figure out where I'm going wrong with the max pooling layer.
self.embed1 = nn.Embedding(256, 8)
self.conv_1 = nn.Conv2d(1, 64, (7,8), padding = (0,0))
self.fc1 = nn.Linear(64, 2)
def forward(self,x):
import pdb; pdb.set_trace()
x = self.embed1(x) #input a tensor of ([1,217]) output size: ([1, 217, 8])
x = x.unsqueeze(0) #conv lay needs a tensor of size (B x C x W x H) so unsqueeze here to make ([1, 1, 217, 8])
x = self.conv_1(x) #creates 64 filter of size (7, 8).Outputs ([1, 64, 211, 1]) as 6 values lost due to not padding.
x = torch.max(x,0) #returning max over the 64 columns. This returns a tuple of length 2 with 64 values in each att, the max val and indices.
x = x[0] #I only need the max values. This returns a tensor of size ([64, 211, 1])
x = x.squeeze(2) #linear layer only wants the number of inputs and number of outputs so I squeeze the tensor to ([64, 211])
x = self.fc1(x) #Error Size mismatch (M1: [64 x 211] M2: [64 x 2])
I understand why the linear layer isn't accepting 211 however I don't understand why my tensor after maxing over the columns isn't 64 x 2.
You use of torch.max returns two outputs: the max value along dim=0 and the argmax along that dimension. Thus, you need to pick only the first output. (you might want to consider using adaptive max pooling for this task).
Your linear layer expects its input to have dim 64 (that is batch_size-by-64 shaped tensor). However, it seems like your x[0] is of shape 13504x1 - definitely not 64.
See this thread for example.
If I'm guessing your intentions correctly, your mistake is that you're using torch.max for 2d maxpooling, instead of torch.nn.functional.max_pool2d. The former reduces across a tensor dimension (for instance across all feature maps or all horizontal lines), whereas the latter reduces in each square spatial neighborhood in the [h, w] plane of a [batch, features, h, w] tensor.
Instead of this:
x = x.squeeze(2)
You can do this instead:
x = x.view(-1, 64) # view will now correctly resize it to [64 x 2]
You can think of view as numpy reshape. We use -1 to signify that we don't know how many rows we want but we know how many columns we have, 64.
I am trying to develop a 1D convolutional neural network with residual connections and batch-normalization based on the paper Cardiologist-Level Arrhythmia Detection with Convolutional Neural Networks, using keras.
This is the code so far:
# define model
x = Input(shape=(time_steps, n_features))
# First Conv / BN / ReLU layer
y = Conv1D(filters=n_filters, kernel_size=n_kernel, strides=n_strides, padding='same')(x)
y = BatchNormalization()(y)
y = ReLU()(y)
shortcut = MaxPooling1D(pool_size = n_pool)(y)
# First Residual block
y = Conv1D(filters=n_filters, kernel_size=n_kernel, strides=n_strides, padding='same')(y)
y = BatchNormalization()(y)
y = ReLU()(y)
y = Dropout(rate=drop_rate)(y)
y = Conv1D(filters=n_filters, kernel_size=n_kernel, strides=n_strides, padding='same')(y)
# Add Residual (shortcut)
y = add([shortcut, y])
# Repeated Residual blocks
for k in range (2,3): # smaller network for testing
shortcut = MaxPooling1D(pool_size = n_pool)(y)
y = BatchNormalization()(y)
y = ReLU()(y)
y = Dropout(rate=drop_rate)(y)
y = Conv1D(filters=n_filters * k, kernel_size=n_kernel, strides=n_strides, padding='same')(y)
y = BatchNormalization()(y)
y = ReLU()(y)
y = Dropout(rate=drop_rate)(y)
y = Conv1D(filters=n_filters * k, kernel_size=n_kernel, strides=n_strides, padding='same')(y)
y = add([shortcut, y])
z = BatchNormalization()(y)
z = ReLU()(z)
z = Flatten()(z)
z = Dense(64, activation='relu')(z)
predictions = Dense(classes, activation='softmax')(z)
model = Model(inputs=x, outputs=predictions)
# Compiling
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['categorical_accuracy'])
# Fitting
model.fit(train_x, train_y, epochs=n_epochs, batch_size=n_batch)
And this is the graph of a simplified model of what I am trying to build.
The model described in the paper uses an incrementing number of filters:
The network consists of 16 residual blocks with 2 convolutional layers per block. The convolutional layers all have a filter length of 16 and have 64k filters, where k starts out as 1 and is incremented every 4-th residual block. Every alternate residual block subsamples its inputs by a factor of 2, thus the original input is ultimately subsampled by a factor of 2^8. When a residual block subsamples the input, the corresponding shortcut connections also subsample their input using a Max Pooling operation with the same subsample factor.
But I can only make it work if I use the same number of filters in every Conv1D layer, with k=1, strides=1 and padding=same, without applying any MaxPooling1D. Any changes in these parameters causes a tensor size mismatch and failure to compile with the following error:
ValueError: Operands could not be broadcast together with shapes (70, 64) (70, 128)
Does anyone have any idea on how to fix this size mismatch and make it work?
In addition, if the input has more than one channel (or features) the mismatch is even worst! Is there a way to deal with more than one channel?
The issue of tensor shape mismatch should be happening in add([y, shortcut]) layer. Because of the fact that you are using MaxPooling1D layer, this halves your time-steps by default, which you can change it by using the pool_size parameter. On the other hand, your residual portion is not reducing the time-steps by same amount. You should apply stride=2 with padding='same' before adding shortcut and y in any one of Conv1D layer (preferably the last one).
For reference, you can check out the Resnet code here Keras-applications-github
I'm trying to impelement this article:
Specfically the equation (3) from section 2.
Shortly I want to do a pairwise distance computation for the features of each mini-batch and insert this loss to the general network loss.
I have only the Tesnor of the batch (16 samples), the labels tensor of the batch and the batch feature Tensor.
After looking for quite a while I still couldn't figure out the following:
1) How do I divide the batch for Positive (i.e. same label) and negative pairs. Since Tensor are not iterateble I can't figure out how to get which sample have which label and then divide my vector, or get which indices of the tensor belong to each class.
2) How can I do pairwise distance calculation for some of the indices in the batch tensor?
3) I also need to define a new distance function for negative examples
Overall, I need to get which indices belong to which class, do a positive pair-wise distace calculation for all positive pairs. And do another calculation for all negative pairs. Then sum it all up and add it to the network loss.
Any help (to one of more of the 3 issues) would be highly appreciated.
You should do the pair sampling before feeding the data into a session. Label every pair a boolean label, say y = 1 for matched-pair, 0 otherwise.
2) 3) Just calculate both pos/neg terms for every pair, and let the 0-1 label y to choose which to add to the loss.
First create placeholders, y_ is for boolean labels.
dim = 64
x1_ = tf.placeholder('float32', shape=(None, dim))
x2_ = tf.placeholder('float32', shape=(None, dim))
y_ = tf.placeholder('uint8', shape=[None]) # uint8 for boolean
Then the loss tensor can be created by the function.
def loss(x1, x2, y):
# Euclidean distance between x1,x2
l2diff = tf.sqrt( tf.reduce_sum(tf.square(tf.sub(x1, x2)),
# you can try margin parameters
margin = tf.constant(1.)
labels = tf.to_float(y)
match_loss = tf.square(l2diff, 'match_term')
mismatch_loss = tf.maximum(0., tf.sub(margin, tf.square(l2diff)), 'mismatch_term')
# if label is 1, only match_loss will count, otherwise mismatch_loss
loss = tf.add(tf.mul(labels, match_loss), \
tf.mul((1 - labels), mismatch_loss), 'loss_add')
loss_mean = tf.reduce_mean(loss)
return loss_mean
loss_ = loss(x1_, x2_, y_)
Then feed your data (random generated for example):
batchsize = 4
x1 = np.random.rand(batchsize, dim)
x2 = np.random.rand(batchsize, dim)
y = np.array([0,1,1,0])
l = sess.run(loss_, feed_dict={x1_:x1, x2_:x2, y_:y})
Short answer
I think the simplest way to do that is to sample the pairs offline (i.e. outside of the TensorFlow graph).
You create tf.placeholder for a batch of pairs along with their labels (positive or negative, i.e. same class or different class), and then you can compute in TensorFlow the corresponding loss.
With the code
You sample the pairs offline. You sample batch_size pairs of inputs, and output the batch_size left elements of the pairs of shape [batch_size, input_size]. You also output the labels of the pairs (either positive of negative) of shape [batch_size,]
pairs_left = np.zeros((batch_size, input_size))
pairs_right = np.zeros((batch_size, input_size))
labels = np.zeros((batch_size, 1)) # ex: [[0.], [1.], [1.], [0.]] for batch_size=4
Then you create Tensorflow placeholders corresponding to these inputs. In your code, you will feed the previous inputs to these placeholders in the feed_dict argument of sess.run()
pairs_left_node = tf.placeholder(tf.float32, [batch_size, input_size])
pairs_right_node = tf.placeholder(tf.float32, [batch_size, input_size])
labels_node = tf.placeholder(tf.float32, [batch_size, 1])
Now we can perform a feedforward on the inputs (let's say your model is a linear model).
W = ... # shape [input_size, feature_size]
output_left = tf.matmul(pairs_left_node, W) # shape [batch_size, feature_size]
output_right = tf.matmul(pairs_right_node, W) # shape [batch_size, feature_size]
Finally we can compute the pairwise loss.
l2_loss_pairs = tf.reduce_sum(tf.square(output_left - output_right), 1)
positive_loss = l2_loss_pairs
negative_loss = tf.nn.relu(margin - l2_loss_pairs)
final_loss = tf.mul(labels_node, positive_loss) + tf.mul(1. - labels_node, negative_loss)
And that's it ! You can now optimize on this loss, with a good offline sampling.