Backpropagatable Transformations with Tensorflow - python

I want to transform general tensors / vectors in Tensorflow, but to have a concrete example let's say rotate images.
For this I would like to have a rotation matrix R, which is learned by my network, i.e. there should be gradients computable.
How would you do this?
I found tf.contrib.image.transform, but for this it is said no gradients are computed into the transformation parameters.
Via py_func also the gradients are not available or would have to be calculated by hand - before writing a long custom solution for this (if even possible), are there maybe any ready-to-use solutions?
I cannot be the first one doing this.
For the requested code: I just want to feed an image as input, maybe apply some convolutional layer and in the end get a 2x2 matrix representing my transformation:
conv_1 = tf.layers.conv2d(conv1, 16, [3, 3], strides=(2, 2), padding='same', activation=tf.nn.leaky_relu)
...
M = tf.contrib.layers.fully_connected(conv_n, 4, activation_fn=tf.nn.tanh)
The matrix M then describes how my indices are transformed (imagining each pixel in the image as a vector with endpoint x, y), and I move each pixel then to its new location.
In numpy I could for example do this:
indices = []
for i in range(28):
for j in range(28):
indices.append([i, j])
indices = np.repeat(np.expand_dims(np.asarray(indices), 0), self.batch_size, 0)
transformed = []
for b in range(self.batch_size):
transformed.append(tf.matmul(indices[b], M[b]))
transformed = tf.stack(transformed)
transformed_img = np.zeros((self.batch_size, 28, 28))
for b in range(self.batch_size):
transformed_img[b, transformed[b, :, :, 0].astype(np.int32), transformed[b, :, :, 1].astype(np.int32)] = input_img[b, :, :, 0]

Related

tensorflow conv2d: input depth must be evenly divisible by filter depth: 1 vs 256

The other similar questions don't work for me. My setup is much simpler but I still get this error when using tensorflow. I am convolving a 2d array representing a point source: a 512 x 512 array with the middle point set to 1, with a 256x256 array representing an imaging system. The convolution should be the point spread function of the system. When doing the tf.conv2d, i keep getting the error in the title. I make sure that the sizes of the arrays are consistent with tensorflow. ie, [1 512 512 1] for the image and [1 256 256 1] for the kernel
def convolve(arr, kernel):
#arr: 512 x 512 2d array
#kernel: 256 x 256 2d array
# make arr 4d
f = tf.cast(tf.reshape(arr, [1, arr.shape[0], arr.shape[1], 1]), tf.float32)
# make kernel 4d
h = tf.cast(tf.reshape(kernel, [1, kernel.shape[0], kernel.shape[1], 1]), tf.float32)
return tf.nn.conv2d(f, h, strides=[1, 1, 1, 1], padding="VALID")
point_source = np.zeros((512,512))
point_source[int(512/2):int(512/2)] = 1
plt.imshow(convolve(point_source, mask_array))
Almost there. Note what the docs state regarding the filters:
A Tensor. Must have the same type as input. A 4-D tensor of shape [filter_height, filter_width, in_channels, out_channels]
Here is a working example:
import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
def convolve(arr, kernel):
#arr: 512 x 512 2d array
#kernel: 256 x 256 2d array
# make arr 4d
f = tf.cast(tf.reshape(arr, [1, arr.shape[0], arr.shape[1], 1]), tf.float32)
# make kernel 4d
h = tf.cast(tf.reshape(kernel, [kernel.shape[0], kernel.shape[1], 1, 1]), tf.float32)
return tf.nn.conv2d(f, h, strides=[1, 1, 1, 1], padding="VALID")
point_source = np.zeros((512,512))
point_source[int(512/2):int(512/2)] = 1
mask_array = np.ones((256, 256))
plt.imshow(convolve(point_source, mask_array)[0, :, :, 0],cmap='gray')

What's the cleanest and most efficient way to pass two stereo images to a loss function in Keras?

First off, why am I using Keras? I'm trying to stay as high level as possible, which doesn't mean I'm scared of low-level Tensorflow; I just want to see how far I can go while keeping my code as simple and readable as possible.
I need my Keras model (custom-built using the Keras functional API) to read the left image from a stereo pair and minimize a loss function that needs to access both the right and left images. I want to store the data in a tf.data.Dataset.
What I tried:
Reading the dataset as (left image, right image), i.e. as tensors with shape ((W, H, 3), (W, H, 3)), then use function closure: define a keras_loss(left_images) that returns a loss(y_true, y_pred), with y_true being a tf.Tensor that holds the right image. The problem with this approach is that left_images is a tf.data.Dataset and Tensorflow complains (rightly so) that I'm trying to operate on a dataset instead of a tensor.
Reading the dataset as (left image, (left image, right image)), which should make y_true a tf.Tensor with shape ((W, H, 3), (W, H, 3)) that holds both the right and left images. The problem with this approach is that it...does not work and raises the following error:
ValueError: Error when checking model target: the list of Numpy arrays
that you are passing to your model is not the size the model expected.
Expected to see 1 array(s), for inputs ['tf_op_layer_resize/ResizeBilinear']
but instead got the following list of 2 arrays: [<tf.Tensor 'args_1:0'
shape=(None, 512, 256, 3) dtype=float32>, <tf.Tensor 'args_2:0'
shape=(None, 512, 256, 3) dtype=float32>]...
So, is there anything I did not consider? I read the documentation and found nothing about what gets considered as y_pred and what as y_true, nor about how to convert a dataset into a tensor smartly and without loading it all in memory.
My model is designed as such:
def my_model(input_shape):
width = input_shape[0]
height = input_shape[1]
inputs = tf.keras.Input(shape=input_shape)
# < a few more layers >
outputs = tf.image.resize(tf.nn.sigmoid(tf.slice(disp6, [0, 0, 0, 0], [-1, -1, -1, 2])), tf.Variable([width, height]))
model = tf.keras.Model(inputs=inputs, outputs=outputs)
return model
And my dataset is built as such (in case 2, while in case 1 only the function read_stereo_pair_from_line() changes):
def read_img_from_file(file_name):
img = tf.io.read_file(file_name)
# convert the compressed string to a 3D uint8 tensor
img = tf.image.decode_png(img, channels=3)
# Use `convert_image_dtype` to convert to floats in the [0,1] range.
img = tf.image.convert_image_dtype(img, tf.float32)
# resize the image to the desired size.
return tf.image.resize(img, [args.input_width, args.input_height])
def read_stereo_pair_from_line(line):
split_line = tf.strings.split(line, ' ')
return read_img_from_file(split_line[0]), (read_img_from_file(split_line[0]), read_img_from_file(split_line[1]))
# Dataset loading
list_ds = tf.data.TextLineDataset('test/files.txt')
images_ds = list_ds.map(lambda x: read_stereo_pair_from_line(x))
images_ds = images_ds.batch(1)
Solved. I just needed to read the dataset as (left image, [left image, right image]) instead of (left image, (left image, right image)) i.e. make the second item a list and not a tuple. I can then access the images as input_r = y_true[:, 1, :, :] and input_l = y_true[:, 0, :, :]

Sliding inner product using Tensorflow convolution

I have two tensors of shape N x D1 and M x D2 where D1 > D2, called X and Y respectively. For my task, X acts as the input and Y acts as the filter.
I want to calculate a matrix P of shape N x M x (D1-D2+1) such that:
P[0,0,0] = dot(X[0,0:D2], Y[0,:])
P[0,0,1] = dot(X[0,1:D2+1], Y[0,:])
...
P[N-1,M-1,D1-D2] = dot(X[N-1,D1-D2:D1], Y[M-1,:])
I can create a for loop and manually slide Y and calculate the dot products.
However I prefer using the correlation operator.
As I know, tensorflow has correlation operator implemented (https://www.tensorflow.org/versions/master/api_docs/python/nn/convolution) but I don't know how can I use my tensors as inputs and filters.
tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None, data_format=None, name=None)
In your case, I'd set strides to 1, and padding to SAME.
tf.nn.conv2d(X, Y, strides=1, padding=SAME)
Yes, you can use indeed tf.nn.conv2d(), but you should add both batch and channel dimensions:
X = tf.expand_dims(tf.expand_dims(X,0),-1)
# X.shape [batch=1, in_height, in_width, in_channels=1]
Y = tf.expand_dims(tf.expand_dims(Y,-1),-1)
# Y.shape = [filter_height, filter_width, in_channels=1, out_channels=1]
# Convolution (actually correlation, see doc of conv2d)
xcorr = tf.nn.conv2d(X, Y, padding="VALID", strides=[1, 1, 1, 1])
# Padding should be VALID, since you've already padded your input
CAVEAT: However, you cannot extrapolate this approach for batches of signals, since tf.nn.conv2d uses always the same filter over the batch dimension, and from my understanding you do want to change it.

Doing pairwise distance computation with TensorFlow

I'm trying to impelement this article:
http://ronan.collobert.com/pub/matos/2008_deep_icml.pdf
Specfically the equation (3) from section 2.
Shortly I want to do a pairwise distance computation for the features of each mini-batch and insert this loss to the general network loss.
I have only the Tesnor of the batch (16 samples), the labels tensor of the batch and the batch feature Tensor.
After looking for quite a while I still couldn't figure out the following:
1) How do I divide the batch for Positive (i.e. same label) and negative pairs. Since Tensor are not iterateble I can't figure out how to get which sample have which label and then divide my vector, or get which indices of the tensor belong to each class.
2) How can I do pairwise distance calculation for some of the indices in the batch tensor?
3) I also need to define a new distance function for negative examples
Overall, I need to get which indices belong to which class, do a positive pair-wise distace calculation for all positive pairs. And do another calculation for all negative pairs. Then sum it all up and add it to the network loss.
Any help (to one of more of the 3 issues) would be highly appreciated.
1)
You should do the pair sampling before feeding the data into a session. Label every pair a boolean label, say y = 1 for matched-pair, 0 otherwise.
2) 3) Just calculate both pos/neg terms for every pair, and let the 0-1 label y to choose which to add to the loss.
First create placeholders, y_ is for boolean labels.
dim = 64
x1_ = tf.placeholder('float32', shape=(None, dim))
x2_ = tf.placeholder('float32', shape=(None, dim))
y_ = tf.placeholder('uint8', shape=[None]) # uint8 for boolean
Then the loss tensor can be created by the function.
def loss(x1, x2, y):
# Euclidean distance between x1,x2
l2diff = tf.sqrt( tf.reduce_sum(tf.square(tf.sub(x1, x2)),
reduction_indices=1))
# you can try margin parameters
margin = tf.constant(1.)
labels = tf.to_float(y)
match_loss = tf.square(l2diff, 'match_term')
mismatch_loss = tf.maximum(0., tf.sub(margin, tf.square(l2diff)), 'mismatch_term')
# if label is 1, only match_loss will count, otherwise mismatch_loss
loss = tf.add(tf.mul(labels, match_loss), \
tf.mul((1 - labels), mismatch_loss), 'loss_add')
loss_mean = tf.reduce_mean(loss)
return loss_mean
loss_ = loss(x1_, x2_, y_)
Then feed your data (random generated for example):
batchsize = 4
x1 = np.random.rand(batchsize, dim)
x2 = np.random.rand(batchsize, dim)
y = np.array([0,1,1,0])
l = sess.run(loss_, feed_dict={x1_:x1, x2_:x2, y_:y})
Short answer
I think the simplest way to do that is to sample the pairs offline (i.e. outside of the TensorFlow graph).
You create tf.placeholder for a batch of pairs along with their labels (positive or negative, i.e. same class or different class), and then you can compute in TensorFlow the corresponding loss.
With the code
You sample the pairs offline. You sample batch_size pairs of inputs, and output the batch_size left elements of the pairs of shape [batch_size, input_size]. You also output the labels of the pairs (either positive of negative) of shape [batch_size,]
pairs_left = np.zeros((batch_size, input_size))
pairs_right = np.zeros((batch_size, input_size))
labels = np.zeros((batch_size, 1)) # ex: [[0.], [1.], [1.], [0.]] for batch_size=4
Then you create Tensorflow placeholders corresponding to these inputs. In your code, you will feed the previous inputs to these placeholders in the feed_dict argument of sess.run()
pairs_left_node = tf.placeholder(tf.float32, [batch_size, input_size])
pairs_right_node = tf.placeholder(tf.float32, [batch_size, input_size])
labels_node = tf.placeholder(tf.float32, [batch_size, 1])
Now we can perform a feedforward on the inputs (let's say your model is a linear model).
W = ... # shape [input_size, feature_size]
output_left = tf.matmul(pairs_left_node, W) # shape [batch_size, feature_size]
output_right = tf.matmul(pairs_right_node, W) # shape [batch_size, feature_size]
Finally we can compute the pairwise loss.
l2_loss_pairs = tf.reduce_sum(tf.square(output_left - output_right), 1)
positive_loss = l2_loss_pairs
negative_loss = tf.nn.relu(margin - l2_loss_pairs)
final_loss = tf.mul(labels_node, positive_loss) + tf.mul(1. - labels_node, negative_loss)
And that's it ! You can now optimize on this loss, with a good offline sampling.

How can I visualize the weights(variables) in cnn in Tensorflow?

After training the cnn model, I want to visualize the weight or print out the weights, what can I do?
I cannot even print out the variables after training.
Thank you!
To visualize the weights, you can use a tf.image_summary() op to transform a convolutional filter (or a slice of a filter) into a summary proto, write them to a log using a tf.train.SummaryWriter, and visualize the log using TensorBoard.
Let's say you have the following (simplified) program:
filter = tf.Variable(tf.truncated_normal([8, 8, 3]))
images = tf.placeholder(tf.float32, shape=[None, 28, 28])
conv = tf.nn.conv2d(images, filter, strides=[1, 1, 1, 1], padding="SAME")
# More ops...
loss = ...
optimizer = tf.GradientDescentOptimizer(0.01)
train_op = optimizer.minimize(loss)
filter_summary = tf.image_summary(filter)
sess = tf.Session()
summary_writer = tf.train.SummaryWriter('/tmp/logs', sess.graph_def)
for i in range(10000):
sess.run(train_op)
if i % 10 == 0:
# Log a summary every 10 steps.
summary_writer.add_summary(filter_summary, i)
After doing this, you can start TensorBoard to visualize the logs in /tmp/logs, and you will be able to see a visualization of the filter.
Note that this trick visualizes depth-3 filters as RGB images (to match the channels of the input image). If you have deeper filters, or they don't make sense to interpret as color channels, you can use the tf.split() op to split the filter on the depth dimension, and generate one image summary per depth.
Like #mrry said, you can use tf.image_summary. For example, for cifar10_train.py, you can put this code somewhere under def train(). Note how you access a var under scope 'conv1'
# Visualize conv1 features
with tf.variable_scope('conv1') as scope_conv:
weights = tf.get_variable('weights')
# scale weights to [0 255] and convert to uint8 (maybe change scaling?)
x_min = tf.reduce_min(weights)
x_max = tf.reduce_max(weights)
weights_0_to_1 = (weights - x_min) / (x_max - x_min)
weights_0_to_255_uint8 = tf.image.convert_image_dtype (weights_0_to_1, dtype=tf.uint8)
# to tf.image_summary format [batch_size, height, width, channels]
weights_transposed = tf.transpose (weights_0_to_255_uint8, [3, 0, 1, 2])
# this will display random 3 filters from the 64 in conv1
tf.image_summary('conv1/filters', weights_transposed, max_images=3)
If you want to visualize all your conv1 filters in one nice grid, you would have to organize them into a grid yourself. I did that today, so now I'd like to share a gist for visualizing conv1 as a grid
You can extract the values as numpy arrays the following way:
with tf.variable_scope('conv1', reuse=True) as scope_conv:
W_conv1 = tf.get_variable('weights', shape=[5, 5, 1, 32])
weights = W_conv1.eval()
with open("conv1.weights.npz", "w") as outfile:
np.save(outfile, weights)
Note that you have to adjust the scope ('conv1' in my case) and the variable name ('weights' in my case).
Then it boils down on visualizing numpy arrays. One example how to visualize numpy arrays is
#!/usr/bin/env python
"""Visualize numpy arrays."""
import numpy as np
import scipy.misc
arr = np.load('conv1.weights.npb')
# Get each 5x5 filter from the 5x5x1x32 array
for filter_ in range(arr.shape[3]):
# Get the 5x5x1 filter:
extracted_filter = arr[:, :, :, filter_]
# Get rid of the last dimension (hence get 5x5):
extracted_filter = np.squeeze(extracted_filter)
# display the filter (might be very small - you can resize the window)
scipy.misc.imshow(extracted_filter)
Using the tensorflow 2 API, There are several options:
Weights extracted using the get_weights() function.
weights_n = model.layers[n].get_weights()[0]
Bias extracted using the numpy() convert function.
bias_n = model.layers[n].bias.numpy()

Categories

Resources