tensorflow-using dynamic shape when defining models - python

I have a batch of input:
input = tf.placeholder(tf.float32, [NUM_SAMPLE, None, 15])
For each one in the batch, I have a dictionary that describes the relationship of rows. It looks like:
dic = {i:{j:rij,k:rik,...},j:{i:rij,l:rjl,...},...}
Now I wanna do this for each sample and corresponding dic:
updated_sample = sample
for i in range(len(sample)):
for j in dic[i]:
tmp = concanate(sample[j],rij)
updated_sample[i] += matmul(tmp,W)
in which W is the same for all samples and rows.
However, I cannot use len(sample) in tensorflow. It seems tf.while_loop may be the answer, but I don't know how to use it in this problem. Any suggestions?
Besides, can I use dictionary in this way in tensorflow?

There are 2 analogs in tensorflow for len(sample):
tf.shape(sample)[0]
sample.get_shape().as_list()[0]
The first one, tf.shape(sample) returns a tensor of integers of length equal to the rank of the tensor, doing tf.shape(sample)[0] is a tensor with shape () and should be used within the tenosrflow workflow.
The second one, sample.get_shape() returns a Tensor.shape object, doing sample.get_shape().as_list() transforms this into a list of integers.
In your case, you should to use the second of these.
Consider also the option of doing this computations at the numpy level, and then input them into the graph through placeholders.

Related

How Do I Use Bisect to Classify a Value?

I have various heights (z) that I want to classify into different layers. Looking online it looks like the bisect_left function is a good way of finding where in a list a value should appear. I have defined the name of each layer to be the position of a given z value in the list. For example, z=15.75 would correspond to layer 1. I have written the function below to implement this.
def layer_update(particle):
layers = [20,15.75,14.75,10.5,9.5,5.25,4.25]
z = particle["z"]
if z in layers:
particle["layer"] = bisect.bisect_left(layers,z)
else:
particle["layer"] = bisect.bisect_left(layers,z) - 1
return(particle)
An if statement is included because my values of z will generally be the values included in the layers list. However, when I run the line
layers = [20,15.75,14.75,10.5,9.5,5.25,4.25]
bisect.bisect_left(layers,20)
I expect to get 0, but I get 7. I've read the documentation for the bisect function and can't see why this might be the case. Can anybody help?
Ps. I believe it might be because the layers list is in descending order, it's important for other parts of my code that this be the case, so ideally any solutions will maintain the list as is.

tf.gather runs out of bound, while using a custom softmax_loss function, even though it shouldn't

I'm using a small custom function inside of tf.contrib.seq2seq.sequence_loss(softmax_loss_function=[...]) as a custom sofmax_loss_function:
def reduced_softmax_loss(self, labels, logits):
top_logits, indices = tf.nn.top_k(logits, self.nb_top_classes, sorted=False)
top_labels = tf.gather(labels, indices)
return tf.nn.softmax_cross_entropy_with_logits_v2(labels=top_labels,
logits=top_logits)
But even though, labels and logits should have the same dimension, after execution it returns and InvalidArgumentError:
indices[1500,1] = 2158 is not in [0, 1600) with numbers varying due to my random seed.
Is there an other function like tf.gather which I could use instead? Or is the returned value in false shape?
Everything works fine, if I'm passing the usual Tensorflow functions.
Thanks in advance!
It's hard to tell what's going on by just looking at your code but I don't think the code you wrote does what you want it to do. The tf.gather operation expects an indices input where each scalar value indexes into the outermost dimension of the first argument, but here the output of top_k tries to index into both the rows and columns, which leads to out of bound errors.

how to loop through each row in a tensor in tensorflow

I have a 2d tensor in tensorflow,
Let's say for example a 2*4 tensor [[1.,2.,3.,4.],[2.,4.,5.,6.]].
I have a function a() to let each row in the tensor to pass, and then sum over all the results of a(). How to do it (not doing it in the session)?
The output should be a([1.,2.,3.,4.]) + a([2.,4.,5.,6.]), in practice I have a very large tensor with many rows.
This is different from reduce_sum, because the a() function here is quite complex, which cannot be directly used through vectorization.
Many thanks!
Perhaps what you're looking for is the map_fn function in Tensorflow. map_fn(a, elems) unpacks a tensor, elems along its first dimension into a sequences of slices, and then applies the supplied function a to each slice, followed by combining the outputs into a single tensor again by concatenating along the first dimension.
It sounds like what you want is
Y = map_fn(a, X)
answer = reduce_sum(Y, axis=0)
where X is your supplied tensor.

num_buckets as a parameter in a tensorflow feature column

Currently Tensorflow documentation define a categorical vocabulary column this way:
vocabulary_feature_column =
tf.feature_column.categorical_column_with_vocabulary_list(
key="feature_name_from_input_fn",
vocabulary_list=["kitchenware", "electronics", "sports"])
However this suppose that we input manually the vocabulary list.
In case of large dataset with many columns and many unique values I would like to automate the process this way:
for k in categorical_feature_names:
vocabulary_feature_column =
tf.feature_column.categorical_column_with_vocabulary_list(
key="feature_name_from_input_fn",
vocabulary_list=list_of_unique_values_in_the_column)
To do so I need to retrieve the parameter list_of_unique_values_in_the_column.
Is there anyway to do that with Tensorflow?
I know there is tf.unique that could return unique values in a tensor but I don't get how I could feed the column to it so it returns the right vocabulary list.
If list_of_unique_values_in_the_column is known, you can save them in one file and read by tf.feature_column.categorical_column_with_vocabulary_file. If unknown, you can use tf.feature_column.categorical_column_with_hash_bucket with a large enough size.

Tensorflow "map operation" for tensor?

I am adapting the cifar10 convolution example to my problem. I'd like to change the data input from a design that reads images one-at-a-time from a file to a design that operates on an already-in-memory set of images. The original inputs() function looks like this:
read_input = cifar10_input.read_cifar10(filename_queue)
reshaped_image = tf.cast(read_input.uint8image, tf.float32)
# Crop the central [height, width] of the image.
resized_image = tf.image.resize_image_with_crop_or_pad(reshaped_image,
width, height)
In the original version, read_input is a tensor containing one image.
I keep all my images in RAM, so instead of using filename_queue, I have one huge images_tensor = tf.constant(images), where images_tensor.shape is (something, 32, 32, 3).
My question is very-very basic: what is the best way to apply some function (tf.image.resize_image_with_crop_or_pad in my case) to all elements of images_tensor?
Iterating is problematic in tensorflow, with limited slices(TensorFlow - numpy-like tensor indexing). Is there a solution to achieving this using just one command?
As of version 0.8 there is map_fn. From the documentation:
map_fn(fn, elems, dtype=None, parallel_iterations=10, back_prop=True,
swap_memory=False, name=None)
map on the list of tensors unpacked from elems on dimension 0.
This map operator repeatedly applies the callable fn to a sequence of elements from first to last. The elements are made of the tensors unpacked from elems. dtype is the data type of the return value of fn. Users must provide dtype if it is different from the data type of elems.
Suppose that elems is unpacked into values, a list of tensors. The shape of the result tensor is [len(values)] + fn(values[0]).shape.
Args:
fn: The callable to be performed.
elems: A tensor to be unpacked to apply fn.
dtype: (optional) The output type of fn.
parallel_iterations: (optional) The number of iterations allowed to run
in parallel.
back_prop: (optional) True enables back propagation.
swap_memory: (optional) True enables GPU-CPU memory swapping.
name: (optional) Name prefix for the returned tensors.
Returns:
A tensor that packs the results of applying fn to the list of tensors
unpacked from elems, from first to last.
Raises:
TypeError: if fn is not callable.
Example:
elems = [1, 2, 3, 4, 5, 6]
squares = map_fn(lambda x: x * x, elems)
# squares == [1, 4, 9, 16, 25, 36]
```
There are a few answers - none quite as elegant as a map function. Which is best depends a bit on your desire for memory efficiency.
(a) You can use enqueue_many to throw them into a tf.FIFOQueue and then dequeue and tf.image.resize_image_with_crop_or_pad an image at a time, and concat it all back into one big smoosh. This is probably slow. Requires N calls to run for N images.
(b) You could use a single placeholder feed and run to resize and crop them on their way in from your original datasource. This is possibly the best option from a memory perspective, because you never have to store the unresized data in memory.
(c) You could use the tf.control_flow_ops.While op to iterate through the full batch and build up the result in a tf.Variable. Particularly if you take advantage of the parallel execution permitted by while, this is likely to be the fastest approach.
I'd probably go for option (c) unless you want to minimize memory use, in which case filtering it on the way in (option b) would be a better choice.
Tensorflow provides a couple of higher-order functions and one of them is tf.map_fn. The usage is very easy: you define your mappping and apply it to the tensor:
variable = tf.Variable(...)
mapping = lambda x: f(x)
res = tf.map_fn(mapping, variable)

Categories

Resources