tensorflow resize image by random factor - python

I am trying to resize an image by a factor during graph construction when the image size is unknown:
H, W, _ = img.get_shape()
scale = tf.random_uniform([1], minval=1, maxval=1.5, dtype=tf.float32, seed=None, name=None)
Out of these I need to magically compute a size which would translate to the following:
tf.image.resize_images(tf.expand_dims(img, 0), [H*scale, W*scale])
Which returns:
ValueError: 'size' must be a 1-D int32 Tensor
Any help is greatly appreciated. Thank you.

Your scale has shape (1,), so [H*scale, W*scale] has shape (1, 2). To fix it, just generate a scalar random instead of a 1-element vector:
scale = tf.random_uniform([], minval=1, maxval=1.5, dtype=tf.float32, seed=None, name=None)
Note the shape is an empty list, meaning you want a scalar.
In addition to that, you have to fix the data type of the size parameter, like this:
tf.image.resize_images(tf.expand_dims(img, 0), tf.cast([H*scale, W*scale], tf.int32))

Your problem is that you're mixing the python list [H*scale, W*scale] with tensors. By default, tensorflow will convert lists like this to tensorflow constants as appropriate. But in this case, your list contains tensors and you're ending up with a nested set of tensors that aren't 1D.
To avoid confusion in cases like this stop using python constructs such as the list and convert your height and width into tensorflow constructs explicitly and check their shape before proceeding.
x = tf.concat((scale*W, scale*H), axis=0)
print(x)
Tensor("concat_3:0", shape=(2,), dtype=float32)
Doing so shows us that we now have a 1D tensor as required. But it's float32 (at least it was in the simple test case I set up), so let's cast that to an int:
x = tf.cast(x, tf.int32)
Now you're ready to go
tf.image.resize_images(tf.expand_dims(img, 0), size=x)
No error should occur there.

Related

In Pytorch, how do you multiply a (b, c, h, w) size tensor with a tensor of size (c)

I have to normalize a tensor of size size (b, c, h, w) with two tensors of size (c) which represent the respective mean and standard deviation.
I cannot manage to figure out how to multiply a tensor of shape, let say torch.Size([1, 3, 128, 128]) with a tensor of shape torch.Size([3]).
What I want to accomplish is: take the first element of the smaller tensor and multiply the first [128x128] part of the larger tensor with it. And do this for the second element and second [128x128] tensor etc.
def normalize(img, mean, std):
""" Normalizes an image tensor.
# Parameters:
#img, torch.tensor of size (b, c, h, w)
#mean, torch.tensor of size (c)
#std, torch.tensor of size (c)
# Returns the normalized image
"""
# TODO: 1. Implement normalization doing channel-wise z-score normalization.
img * mean #try1: this doesn't work
torch.mul(img.view(3,128,128), mean) #try2: this doesn't work
return img
Both of my attempts throw the same error: RuntimeError: The size of tensor a (128) must match the size of tensor b (3) at non-singleton dimension 3.
I imagine you could create a tensor of the needed size, fill it with the values necessary and multiply that, but I would image there is a better solution than that.
img * mean.reshape(1,3,1,1)
Will reshape the mean tensor so that torch can understand which dimensions you are trying to multiply together.
Edit for details:
Torch reads tensor sizes from lowest to highest dimension, so it can infer some of the higher dimensions (e.g. img * mean.reshape(3,1,1) also works in your case), however you must specify the lower dimensions to either be one, or match the tensor you are trying to multiply with.

How to combine tf.map_fn and tf.split

So the pseucode of thing i want is:
splitted_outputs = [tf.split(output, rate, axis=0) for output in outputs]
where outputs is Tensor of shape (512, ?, 128), and splitted_outputs is list of lists of Tensors or Tensor with 3 dimensions. So i can iterate such tensor tensorflow.
I've tried to use tf.map_fn:
splitted_outputs = tf.map_fn(
lambda output: tf.split(output, rate, axis=0),
outputs,
dtype=list
)
but it's not possible cause list is not legal tf dtype.
You can use tf.unstack on outputs to get a list of "subtensors", then use tf.split on each of those:
splitted_outputs = [tf.split(output, rate, axis=0) for output in tf.unstack(outputs, axis=0)]
Note that tf.unstack can only be used like that when the size of the given axis is known, or otherwise you would need to provide a num parameter.

How to feed input with changing size in Tensorflow

I want to train a network with planar curves, which I represent as numpy arrays with shape (L,2).
The number 2 stands for x,y coordinates and L is the number of points which is changing in my dataset. I treat x,y as 2 different "channels".
I implemented a function, next_batch(batch_size), that provides the next batch as a 1D numpy array with shape (batch_size,), containing elements which are 2D arrays with shape: (L,2). These are my curves, and as mentioned before, L is different between the elements. (I didn't want to confine to fixed number of points in the curve).
My question:
How can I manipulate the output from next_batch() so I will able to feed the network with the input curves, using a scheme similar to what appears in Tensorflow tutorial: https://www.tensorflow.org/get_started/mnist/pros
i.e, using the feed_dict mechanism.
In the given turorial the input size was fixed, in the tutorial's code line:
train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
batch[0] has a fixed shape: (50,784) (50 = #samples,784 = #pixels)
I cannot transform my input into numpy array with shape (batch_size,L,2)
since the array should have fixed size in every dimension.
So what can I do?
I already defined a placeholder (that can have unknown size):
#first dimension is the sample dim, second is curve length, third:x,y coordinates
x = tf.placeholder(tf.float32, [None, None,2])
but how can I feed it properly?
Short answer that you're probably looking for: you can't without padding or grouping samples by lenght.
To elaborate a bit: in tensorflow, dimensions must be fixed throughout a batch, and jagged arrays are not natively supported.
Dimensions may be unknown a priori (in which case you set the placeholders' dimensions to None) but are still inferred at runtime, so
your solution of having a placeholder:
x = tf.placeholder(tf.float32, [None, None, 2])
couldn't work because it's semantically equivalent to saying "I don't know the constant length of the curves in a batch a priori, infer it at runtime from the data".
This is not to say that your model in general can't accept inputs of different dimensions, if you structure it accordingly, but the data that you feed it each time you call sess.run() must have fixed dimensions.
Your options, then, are as follows:
Pad your batches along the second dimension.
Say that you have 2 curves of shape (4, 2) and (5, 2) and you know the maximum curve length in you dataset is 6, you could use np.pad as follows:
In [1]: max_len = 6
...: curve1 = np.random.rand(4, 2)
...: curve2 = np.random.rand(5, 2)
...: batch = [curve1, curve2]
In [2]: for b in batch:
...: dim_difference = max_len - b.shape[0]
...: print np.pad(b, [(0, dim_difference), (0,0)], 'constant')
...:
[[ 0.92870128 0.12910409]
[ 0.41894655 0.59203704]
[ 0.3007023 0.52024492]
[ 0.47086336 0.72839691]
[ 0. 0. ]
[ 0. 0. ]]
[[ 0.71349902 0.0967278 ]
[ 0.5429274 0.19889411]
[ 0.69114597 0.28624011]
[ 0.43886002 0.54228625]
[ 0.46894651 0.92786989]
[ 0. 0. ]]
Have your next_batch() function return batches of curves grouped by length.
These are the standard ways of doing things when dealing with jagged arrays.
Another possibility, if your task allows for it, is to concatenate all your points in a single tensor of shape (None, 2) and change your model to operate on single points as if they were samples in a batch. If you save the original sample lengths in a separate array, you can then restore the model outputs by slicing them correctly. This is highly inefficient and requires all sorts of assumptions on your problem, but it's a possibility.
Cheers and good luck!
You can use input with different sizes in TF. just feed the data in the same way as in the tutorial you listed, but make sure to define the changing dimensions in the placeholder as None.
Here's an simple example of feeding a placeholder with different shapes:
import tensorflow as tf
import numpy as np
array1 = np.arange(9).reshape((3,3))
array2 = np.arange(16).reshape((4,4))
array3 = np.arange(25).reshape((5,5))
model_input = tf.placeholder(dtype='float32', shape=[None, None])
sqrt_result = tf.sqrt(model_input)
with tf.Session() as sess:
print sess.run(sqrt_result, feed_dict={model_input:array1})
print sess.run(sqrt_result, feed_dict={model_input:array2})
print sess.run(sqrt_result, feed_dict={model_input:array3})
You can use placeholder with initial the var with [None, ..., None]. Each 'None' means there are input feed data at that dimension for the compiler. For example, [None, None] means a matrix with any row and column length you can feed. However, you should take care about which kind of NN you use. Because when you deal with CNN, at the convolution layer and pool layer you must identify the specific size of the 'tensor'.
Tensorflow Fold might be of interest to you.
From the Tensorflow Fold README:
TensorFlow Fold is a library for creating TensorFlow models that consume structured data, where the structure of the computation graph depends on the structure of the input data.Fold implements dynamic batching. Batches of arbitrarily shaped computation graphs are transformed to produce a static computation graph. This graph has the same structure regardless of what input it receives, and can be executed efficiently by TensorFlow.
The graph structure can be set up so as to accept an arbitrary L value so that any structured input can be read in. This is especially helpful when building architectures such as recursive neural nets. The overall structure is very similar to what you are used to (feed dicts, etc). Since you need a dynamic computational graph for your application, this might be a good move for you in the long run.

regarding the tensor shape is (?,?,?,1)

During debuging the Tensorflow code, I would like to output the shape of a tensor, say, print("mask's shape is: ",mask.get_shape()) However, the corresponding output is mask's shape is (?,?,?,1) How to explain this kind of output, is there anyway to know the exactly value of the first three dimensions of this tensor?
This output means that TensorFlow's shape inference has only been able to infer a partial shape for the mask tensor. It has been able to infer (i) that mask is a 4-D tensor, and (ii) its last dimension is 1; but it does not know statically the shape of the first three dimensions.
If you want to get the actual shape of the tensor, the main approaches are:
Compute mask_val = sess.run(mask) and print mask_val.shape.
Create a symbolic mask_shape = tf.shape(mask) tensor, compute mask_shape_val = sess.run(mask_shape) and print `mask_shape.
Shapes usually have unknown components if the shape depends on the data, or if the tensor is itself a function of some tensor(s) with a partially known shape. If you believe that the shape of the mask should be static, you can trace the source of the uncertainty by (recursively) looking at the inputs of the operation(s) that compute mask and finding out where the shape becomes partially known.

Get output from Lasagne (python deep neural network framework)

I loaded the mnist_conv.py example from official github of Lasagne.
At the and, I would like to predict my own example. I saw that "lasagne.layers.get_output()" should handle numpy arrays from official documentation, but it doesn't work and I cannot figure out how can I do that.
Here's my code:
if __name__ == '__main__':
output_layer = main() #the output layer from the net
exampleChar = np.zeros((28,28)) #the example I would predict
outputValue = lasagne.layers.get_output(output_layer, exampleChar)
print(outputValue.eval())
but it gives me:
TypeError: ConvOp (make_node) requires input be a 4D tensor; received "TensorConstant{(28, 28) of 0.0}" (2 dims)
I understand that it expects a 4D tensor, but I don't have any idea how to correct it.
Can you help me? Thanks
First you try pass a single "image" into your network, which so it has the dimension (256,256).
But it need a list of 3 dimensional data i.e. images, which in theano is implemented as 4D tensor.
I don't see your full code, how you intended to use lasagne's interface, but if your code is written properly, from what I saw so far, I think you should convert your (256,256) data first to a one single channel image like (1,256,256), then make a list from either use more (1,256,256) data passed in a list e.g. [(1,256,256), (1,256,256), (1,256,256)], or make a list from this single example like [(1,256,256)].
Former you get and then pass a (3,1,256,256), latter a (1,1,256,256) 4D tensor, which will be accepted by lasagne interface.
As written in your error message, the input is expected to be a 4D tensor, of shape (n_samples, n_channel, width, height). In the MNIST case, n_channels is 1, and width and height are 28.
But you are inputting a 2D tensor, of shape (28, 28). You need to add new axes, which you can do with exampleChar = exampleChar[None, None, :, :]
exampleChar = np.zeros(28, 28)
print exampleChar.shape
exampleChar = exampleChar[None, None, :, :]
print exampleChar.shape
outputs
(28, 28)
(1, 1, 28, 28)
Note: I think you can use np.newaxis instead of None to add an axis. And exampleChar = exampleChar[None, None] should work too.

Categories

Resources