Tensorflow reading a tab separated file - python

I am trying to read a tab separated file into tensorflow
# Metadata describing the text columns
COLUMNS = ['queue_name','block_name', 'car_name',
'position_id', 'x_ord',
'y_ord']
FIELD_DEFAULTS = [[''], [''], [''], [0], [0], [0]]
def _parse_line(line):
# Decode the line into its fields
fields = tf.decode_csv(line, FIELD_DEFAULTS, field_delim="\t")
# Pack the result into a dictionary
features = dict(zip(COLUMNS,fields))
# Separate the label from the features
y = features.pop('y_ord')
x = features.pop('x_ord')
return features, x, y
ds = tf.data.TextLineDataset(filenames).skip(1)
ds = ds.map(_parse_line)
with tf.Session() as sess:
print(sess.run(ds)) # I am getting an error when running the session
However, this gives me an error
TypeError: Fetch argument <MapDataset shapes: ({period_name: (), block_name: (), trial_name: (), trial_id: ()}, (), ()), types: ({period_name: tf.string, block_name: tf.string, trial_name: tf.string, trial_id: tf.int32}, tf.int32, tf.int32)> has invalid type <class 'tensorflow.python.data.ops.dataset_ops.MapDataset'>, must be a string or Tensor. (Can not convert a MapDataset into a Tensor or Operation.)
Does this mean I cannot combine string and integers in a map dataset or am I doing something wrong?

The reason for the error is because you are trying to run something that is not a Tensor or an Operation but a Dataset object. You can create a tensor from the Dataset object such that everytime you run it, you get the next sample from your dataset.
Try the following:
value = ds.make_one_shot_iterator().get_next()
print(sess.run(value)) # First value in your dataset
print(sess.run(value)) # Second value in your dataset
Building up from here, you can construct rest of your model from this tensor.
See the docs at https://www.tensorflow.org/api_docs/python/tf/data/Dataset#from_generator

Related

Tensorflow input function returns invalid values (Tensor instead of Tensor dict)

I have been working on standard image classification problem with Tensorflow. Most of the code is derived from tutorials on www.tensorflow.org, the only major change is that I use my own data.
The images are already processed, of same size, encoding etc., and sorted into appropriate folders called groupA and groupB. While most of the code worked without a problem (images get loaded from disk and decoded, labels are assigned etc.) I encountered an unexpected roadblock.
labels = tf.constant([1.0 if 'groupA' in filename else 0.1 for filename in training_data])
file_names = tf.constant(training_data)
dataset = tf.data.Dataset.from_tensor_slices((file_names, labels))
def _parse_function(filename, label):
image_string = tf.read_file(filename)
image = tf.image.decode_png(filename)
return image, label
dataset = dataset.map(_parse_function)
def create_input_fn_train(dataset):
def input_fn():
ds = dataset.shuffle(buffer_size=10000)
ds = ds.batch(16)
ds = ds.repeat()
iterator = ds.make_one_shot_iterator()
images, labels = iterator.get_next()
return images, labels
return input_fn
input_fn_train = create_input_fn_train(dataset)
model = tf.estimator.DNNClassifier(feature_columns=[construct_feature_columns(150,150)],
hidden_units=[1024,100],
optimizer=tf.train.AdamOptimizer(1e-4),
n_classes=2,
dropout=0.1,
model_dir="./tmp/fon_model")
The input function is returning wrong type of data, causing following error.
ValueError("features should be a dictionary of `Tensor`s. Given type: <class 'tensorflow.python.framework.ops.Tensor'>",)
I tried to look up possible solutions, leading me to tensorflow ValueError: features should be a dictionary of `Tensor`s. Given type: <class 'tensorflow.python.framework.ops.Tensor'> and tried the provided solution.
def _parse_function(filename, label):
image_string = tf.read_file(filename)
image = tf.image.decode_png(filename)
features = {}
features['pixels'] = image
return features, label
But it gave me another error message, this time:
ValueError("Items of feature_columns must be a _FeatureColumn. Given (type <class 'set'>): {_NumericColumn(key='pixels', shape=(22500,), default_value=None, dtype=tf.float32, normalizer_fn=None)}.",)
Which leads me to believe that I made some sort of fundamental error. Either when parsing the data, assigning labels etc.
But I can't figure out where I went wrong.
EDIT:
I altered the code so that estimator constructor is
model = tf.estimator.DNNClassifier(feature_columns=construct_feature_columns(150,150),
hidden_units=[1024,100],
optimizer=tf.train.AdamOptimizer(1e-4),
n_classes=2,
dropout=0.1,
model_dir="./tmp/fon_model")
and the construct_feature_columns into:
def construct_feature_columns(image_height, image_width):
return set([tf.feature_column.numeric_column('pixels', shape=[image_height*image_width])])
Eliminating the ValueError:
ValueError("Items of feature_columns must be a _FeatureColumn. Given (type <class 'set'>): {_NumericColumn(key='pixels', shape=(22500,), default_value=None, dtype=tf.float32, normalizer_fn=None)}.",)
I also rewrote the input function into:
def input_fn():
ds = dataset.shuffle(buffer_size=len(training_data))
ds = ds.batch(16)
ds = ds.repeat()
iterator = ds.make_one_shot_iterator()
images, labels = iterator.get_next()
images = {'pixels':images}
return images, labels
However new error turned up, confirming my belief that I messed up something fundamental:
UnimplementedError (see above for traceback): Cast string to float is not supported
[[Node: dnn/input_from_feature_columns/input_layer/pixels/ToFloat = Cast[DstT=DT_FLOAT, SrcT=DT_STRING, _device="/job:localhost/replica:0/task:0/device:CPU:0"](dnn/input_from_feature_columns/input_layer/pixels/ExpandDims)]]

TensorFlow decode_csv shape error

I read in a *.csv file using tf.data.TextLineDataset and apply map on it:
dataset = tf.data.TextLineDataset(os.path.join(data_dir, subset, 'label.txt'))
dataset = dataset.map(lambda value: parse_record_fn(value, is_training),
num_parallel_calls=num_parallel_calls)
Parse function parse_record_fn looks like this:
def parse_record(raw_record, is_training):
default_record = ["./", -1]
filename, label = tf.decode_csv([raw_record], default_record)
# do something
return image, label
But there raise an ValueError at tf.decode_csv in parse function:
ValueError: Shape must be rank 1 but is rank 0 for 'DecodeCSV' (op: 'DecodeCSV') with input shapes: [1], [], [].
My *.csv file example:
/data/1.png, 5
/data/2.png, 7
Question:
Where goes wrong?
What does shapes: [1], [], [] mean?
Reproduce
This error can be reproduced in this code:
import tensorflow as tf
import os
def parse_record(raw_record, is_training):
default_record = ["./", -1]
filename, label = tf.decode_csv([raw_record], default_record)
# do something
return image, label
with tf.Session() as sess:
csv_path = './labels.txt'
dataset = tf.data.TextLineDataset(csv_path)
dataset = dataset.map(lambda value: parse_record(value, True))
sess.run(dataset)
Looking at the documentation of tf.decode_csv, it says about the default records:
record_defaults: A list of Tensor objects with specific types.
Acceptable types are float32, float64, int32, int64, string. One
tensor per column of the input record, with either a scalar default
value for that column or empty if the column is required.
I believe the error you are getting originates from how you define the tensor default_record. Your default_record certainly is a list of tensor objects (or objects convertible to tensors), but I think the error message is telling that they should be rank-1 tensors, not rank-0 tensors as in your case.
You can fix the issue by making the default records rank 1 tensors. See the following toy example:
import tensorflow as tf
my_line = 'filename.png, 10'
default_record_1 = [['./'], [-1]] # do this!
default_record_2 = ['./', -1] # this is what you do now
decoded_1 = tf.decode_csv(my_line, default_record_1)
with tf.Session() as sess:
d = sess.run(decoded_1)
print(d)
# This will cause an error
decoded_2 = tf.decode_csv(my_line, default_record_2)
The error produced on the last line is familiar:
ValueError: Shape must be rank 1 but is rank 0 for 'DecodeCSV_1' (op:
'DecodeCSV') with input shapes: [], [], [].
In the message, the input shapes, the three brackets [], refer to the shapes of the input arguments records, record_defaults, and field_delim of tf.decode_csv. In your case the first of these shapes is [1] since you input [raw_record]. I agree that the message for this case is not very informative...

How to initialize intitial_state for LSTM in tf.nn.dynamic_rnn?

I am not sure how to pass a value for initial_state when the cell is a LSTMCell. I am using LSTMStateTuple as it is shown in the following piece of code:
c_placeholder = tf.placeholder(tf.float32, [ None, config.state_dim], name='c_lstm')
h_placeholder = tf.placeholder(tf.float32, [ None, config.state_dim], name='h_lstm')
state_tuple = tf.nn.rnn_cell.LSTMStateTuple(c_placeholder, h_placeholder)
cell = tf.contrib.rnn.LSTMCell(num_units=config.state_dim, state_is_tuple=True, reuse=not is_training)
rnn_outs, states = tf.nn.dynamic_rnn(cell=cell, inputs=x,sequence_length=seqlen, initial_state=state_tuple, dtype= tf.float32)
However, the execution returns this error:
TypeError: 'Tensor' object is not iterable.
Here is the link of the documentation for dynamic_rnn
I've seen this same error before. I was using multiple layers of RNN-cells made with tf.contrib.rnn.MultiRNNCell, and I needed to specify a tuple of LSTMStateTuples -- one for each layer. Something like
state = tuple(
[tf.nn.rnn_cell.LSTMStateTuple(c_ph[i], h_ph[i])
for i in range(nRecurrentLayers)]
)

Tensorflow - feeding examples with different length

Each of my training examples is a list with different length.
I am trying to find a way to feed those examples into the graph.
Below is my attempt to do so by creating a list whose elements are placeholders with unknown dimensions.
graph2 = tf.Graph()
with graph2.as_default():
A = list ()
for i in np.arange(3):
A.append(tf.placeholder(tf.float32 ,shape = [None,None]))
A_size = tf.shape(A)
with tf.Session(graph=graph2) as session:
tf.initialize_all_variables().run()
feed_dict = {A[0]:np.zeros((3,7)) ,A[1] : np.zeros((3,2)) , A[2] : np.zeros((3,2)) }
print ( type(feed_dict))
B = session.run(A_size ,feed_dict=feed_dict)
print type(B)
However I got the following error:
InvalidArgumentError: Shapes of all inputs must match: values[0].shape = [3,7] != values[1].shape = [3,2]
Any idea on how to solve it?
From the documentation of tf.placeholder:
shape: The shape of the tensor to be fed (optional). If the shape is not specified, you can feed a tensor of any shape.
You need to write shape=None instead of shape=[None, None]. With your code, Tensorflow doesn't know you are dealing with variable size input.

How to prevent Tensorflow unpack() method casting to float64

I'm trying to set up a sequential RNN in Tensorflow with seq2seq.rnn_decoder(). The input that rnn_decoder() wants is a list of tensors, so to generate this I've passed in a rank-3 tensor and used tf.unpack() to make it into a list. The problem arises when the float32 array that I pass in in turned into a float64 tensor by tf.unpack(), making it incompatible with the rest of the model. Here's the code I put together to convince me that the culprit is tf.unpack():
inputDat = loader.getSequential(BATCH_SIZE)
print(inputDat.shape)
output (BATCH_SIZE is five, sequence length is ten):
(10, 5, 3)
Then I can load this data in a Tensorflow session:
sess = tf.InteractiveSession()
input_tensor = tf.constant(inputDat.astype('float32'), dtype=tf.float32)
print "Input tensor type: " + str(type(input_tensor.eval()[0,0,0]))
input_tensor = tf.unpack(inputDat)
print "Input tensor shape: " + str(len(input_tensor)) + "x" + str(input_tensor[0].eval().shape)
print "Input tensor type: " + str(type(input_tensor[0].eval()[0,0]))
Output:
Input tensor type: <type 'numpy.float32'>
Input tensor shape: 10x(5, 3)
Input tensor type: <type 'numpy.float64'>
What's going on here? Using a FOR loop to iterate through each of the sequential entries and re-cast it seems like the wrong way to do this, and I can't find a method inside Tensorflow to cast every member of a list.
You don't need a for-loop: you can use tf.cast().
Example:
input_tensor = tf.unpack(inputDat) # result is 64-bit
input_tensor = tf.cast(input_tensor, tf.float32) # now it's 32-bit

Categories

Resources