How to test dataset iterators within jupyter notebook - python

I want to experiment with different functions to parse my csv and I am trying to use the tf.dataset iterators, yet I am having trouble getting this to work. My goal with the code below is to essentially print the first parsed row.
import tensorflow as tf
filenames = 'my_dataset.csv'
dataset = tf.data.TextLineDataset(filenames).skip(1).map(lambda row: parse_csv(row, hparams))
iterator = dataset.make_one_shot_iterator()
next_row = iterator.get_next()
with tf.Session() as sess:
#sess.run(iterator.initializer)
while True:
try:
print(sess.run(next_x))
except tf.errors.OutOfRangeError:
break
Now if I run this you will see that I get FailedPreconditionError (see above for traceback): GetNext() failed because the iterator has not been initialized. Ensure that you have run the initializer operation for this iterator before getting the next element. so then I proceed to uncomment the iterator.initializer and I get another error ValueError: Iterator does not have an initializer.
What changes need to be made to actually step through to see what is happening with my parse_csv call?

You run sess.run(next_x) instead of sess.run(next_row).

Related

Tensorflow iterator fails to iterate

I am working on a project related to instance segmentation. I am trying to train a SegNet with my own image dataset which comprises a set of images and their corresponding masks, and I have successfully used tf.Dataset to load my data. But every time I use the feedable iterator to feed the dataset to SegNet, my program is always terminated without any error or warning. My code is shown below.
load_satellite_image() is used to read filename for images and dataset() is used to load images with tf.Dataset. It seems that the iterator fails to update the input pipeline.
train_path = "data_example/train.txt"
val_path = "data_example/test.txt"
config_file = 'config.json'
with open(config_file) as f:
config = json.load(f)
train_img, train_mask = load_satellite_image(train_path)
val_img, val_mask = load_satellite_image(val_path)
train_dataset = dataset(train_img, train_mask, config, True, 0, 1)
val_dataset = dataset(val_img, val_mask, config, True, 0, 1)
train_iter = train_dataset.make_initializable_iterator()
validation_iter = val_dataset.make_initializable_iterator()
handle = tf.placeholder(tf.string, shape=[])
iterator = tf.data.Iterator.from_string_handle(handle,
train_dataset.output_types,train_dataset.output_shapes)
next_element = iterator.get_next()
with tf.Session() as Sess:
sess.run(train_iter.initializer)
sess.run(validation_iter.initializer)
train_iter_handle = sess.run(train_iter.string_handle())
val_iter_handle = sess.run(validation_iter.string_handle())
for i in range(2):
print("1")
try:
while True:
for i in range(5):
print(sess.run(next_element,feed_dict={handle:train_iter_handle}))
print('----------------------------','\n')
for i in range(2):
print(sess.run(next_element,feed_dict={handle:val_iter_handle}))
except tf.errors.OutOfRangeError:
pass
After running the code above, I got:
In [2]: runfile('D:/python_code/tensorflow_study/SegNet/load_data.py',
wdir='D:/python_code/tensorflow_study/SegNet')
(tf.float32, tf.int32)
(TensorShape([Dimension(360), Dimension(480), Dimension(3)]), TensorShape([Dimension(360),
Dimension(480), Dimension(1)]))
(tf.float32, tf.int32)
(TensorShape([Dimension(360), Dimension(480), Dimension(3)]), TensorShape([Dimension(360),
Dimension(480), Dimension(1)]))
WARNING:tensorflow:From D:\Anaconda\envs\tensorflow-gpu\lib\site-
packages\tensorflow\python\data\ops\dataset_ops.py:1419: colocate_with (from
tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
In [1]:
I am confused that my code is terminated without any reason. As you can see, I can get the shape and datatype of training/ validation images and masks, which means the problem has nothing to do with my dataset. However, the for loop in the tf.Session() is not executed and I cannot get the result of print("1"). The iterator is not executed by sess.run() as well. Anyone have met this problem before?
Thanks!!!
Problem solved. It's a stupid mistake that wastes me a lot of time.
The reason why my program is terminated without error message is that I am using stupid Spyder to write my code, and I don't know why it doesn't show the error message. Actually, there exists an error message produced by TensorFlow. By coincidence, I ran my code via the command window of Anaconda and I got this error message:
2020-04-30 17:31:03.591207: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at whole_file_read_ops.cc:114 : Invalid argument: NewRandomAccessFile failed to Create/Open: D:\Study\PhD\python_code\tensorflow_study\SegNet\data_example\trainannot\ges_517405_679839_21.jpg
The iterator doesn't work because Tensorflow cannot find mask locations. The image and mask locations are stored in a text file like this:
data_example\train\ges_517404_679750_21.jpg,data_example\trainannot\ges_517404_679750_21.jpg
data_example\train\ges_517411_679762_21.jpg,data_example\trainannot\ges_517411_679762_21.jpg
The left side is the locations of raw images and the right side is the locations of their masks. In the beginning, I used split(",") to get the location of images and masks separately, but it seems that there is something wrong with the locations of masks. So I checked the code that is used to generate the text file:
file.writelines([Train_path[i],',',TrainAnnot_path[i],'\n'])
Each line in the text file ends with \n, and this is why Tensorflow cannot get the location of the masks. So I replaced file.writelines([Train_path[i],',',TrainAnnot_path[i],'\n'])with file.writelines([Train_path[i],' ',TrainAnnot_path[i],'\n']), and used strip().split(" ") rather than split(" "). That solves the problem.

RuntimeError: The Session graph is empty. Add operations to the graph before calling run()

I am training a neural network with keras, and since my dataset is very large I am using fit_generator to feed the data to the network.
As the first argument of fit_generator I have to provide a generator that generates patches of data to my model.
I use tf.data.Dataset in order to make a dataset and feed the network using make_one_shot_iterator and calling get_next method.
Here is the code
def generator():
dataset_iterator = DatasetGenerator(...) # defined class to returns a tf iterator
with tf.Session() as sess:
next_batch = dataset_iterator.get_next()
while True:
img, label = sess.run(next_batch)
# some process on label
yield img, label
# down in the code for training:
model.fit_generator(generator=generator(), ...)
This works perfectly fine.
The problem begins when I try to send dataset_iterator as the argument to the generator method, like this:
def generator(dataset_iterator):
with tf.Session() as sess:
next_batch = dataset_iterator.get_next()
while True:
img, label = sess.run(next_batch)
# some process on label
yield img, label
# down in the code for training:
dataset_iterator = DatasetGenerator(...)
model.fit_generator(generator=generator(dataset_iterator), ...)
Now, I get the following error:
RuntimeError: The Session graph is empty. Add operations to the graph before calling run().
I found a way to handle it.
What I found out was that printing tf.get_default_graph() in generator method and in main method (I mean before calling model.fit_generator) returns different graphs.
Why? I have no idea!
Anyway, I solved it by sending default graph as another argument to the function and introducing it to tf.Session(). Like this:
def generator(dataset_iterator, default_graph):
with tf.Session(graph=default_graph) as sess:
next_batch = dataset_iterator.get_next()
while True:
img, label = sess.run(next_batch)
# some process on label
yield img, label
# down in the code for training:
dataset_iterator = DatasetGenerator(...)
default_graph = tf.get_default_graph()
model.fit_generator(generator=generator(dataset_iterator, default_graph), ...)
I actually don't know if this is the most elegant way to solve the problem. Further improvements are greatly appreciated :)
It's eager execution, disable it if you want to create an empty session.
import tensorflow as tf
tf.compat.v1.disable_eager_execution()

How to use tf.cond in combination with batching operations / queue runners

Situation
I want to train a specific network architecture (a GAN) that needs inputs from different sources during training.
One input source is examples loaded from disk. The other source is a generator sub-network creating examples.
To choose which kind of input to feed to the network I use tf.cond. There is one caveat though that has already been explained: tf.cond evaluates the inputs to both conditional branches even though only one of those will ultimately be used.
Enough setup, here is a minimal working example:
import numpy as np
import tensorflow as tf
BATCH_SIZE = 32
def load_input_data():
# Normally this data would be read from disk
data = tf.reshape(np.arange(10 * BATCH_SIZE, dtype=np.float32), shape=(10 * BATCH_SIZE, 1))
return tf.train.batch([data], BATCH_SIZE, enqueue_many=True)
def generate_input_data():
# Normally this data would be generated by a much bigger sub-network
return tf.random_uniform(shape=[BATCH_SIZE, 1])
def main():
# A bool to choose between loaded or generated inputs
load_inputs_pred = tf.placeholder(dtype=tf.bool, shape=[])
# Variant 1: Call "load_input_data" inside tf.cond
data_batch = tf.cond(load_inputs_pred, load_input_data, generate_input_data)
# Variant 2: Call "load_input_data" outside tf.cond
#loaded_data = load_input_data()
#data_batch = tf.cond(load_inputs_pred, lambda: loaded_data, generate_input_data)
init_op = tf.initialize_all_variables()
with tf.Session() as sess:
sess.run(init_op)
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
print(threads)
# Get generated input data
data_batch_values = sess.run(data_batch, feed_dict={load_inputs_pred: False})
print(data_batch_values)
# Get input data loaded from disk
data_batch_values = sess.run(data_batch, feed_dict={load_inputs_pred: True})
print(data_batch_values)
if __name__ == '__main__':
main()
Problem
Variant 1 does not work at all since the queue runner threads don't seem to run. print(threads) outputs something like [<Thread(Thread-1, stopped daemon 140165838264064)>, ...].
Variant 2 does work and print(threads) outputs something like [<Thread(Thread-1, started daemon 140361854863104)>, ...]. But since load_input_data() has been called outside of tf.cond, batches of data will be loaded from disk even when load_inputs_pred is False.
Is it possible to make Variant 1 work, so that input data is only loaded when load_inputs_pred is True and not for every call to session.run()?
If you're using a queue when loading your data and follow it up with a batch input then this shouldn't be a problem as you can specify the max amount to have loaded or stored in the queue.
input = tf.WholeFileReader(somefilelist) # or another way to load data
return tf.train.batch(input,batch_size=10,capacity=100)
See here for more details:
https://www.tensorflow.org/versions/r0.10/api_docs/python/io_ops.html#batch
Also there's an alternative approach that skips the tf.cond completely. Just define two losses one that follows the data through the autoencoder and discrimator and the other that follows the data through just the discriminator.
Then it just becomes a matter of calling
sess.run(auto_loss,feed_dict)
or
sess.run(real_img_loss,feed_dict)
In this way the graph will only run through which ever loss was called upon. Let me know if this needs more explanation.
Lastly I think to make variant one work you need to do something like this if you're using preloaded data.
https://www.tensorflow.org/versions/r0.10/how_tos/reading_data/index.html#preloaded-data
Otherwise I'm not sure what the issue is to be honest.

Tensorflow session returns as 'closed'

I have successfully ported the CIFAR-10 ConvNet tutorial code for my own images and am able to train on my data and generate Tensorboard outputs etc.
My next step was to implement an evaluation of new data against the model I built. I am trying now to use cifar10_eval.py as a starting point however am running into some difficulty.
I should point out that the original tutorial code runs entirely without a problem, including cifar10_eval.py. However, when moving this particular code to my application, I get the following error message (last line).
RuntimeError: Attempted to use a closed Session.
I found this error is thrown by TF's session.py
# Check session.
if self._closed:
raise RuntimeError('Attempted to use a closed Session.')
I have checked the directories in which all files should reside and be created, and all seems exactly as it should (they mirror perfectly those created by running the original tutorial code). They include a train, eval and data folders, containing checkpoints/events files, events file, and data binaries respectively.
I wonder if you could help pointing out how I can debug this, as I'm sure there may be something in the data flow that got disrupted when transitioning the code. Unfortunately, despite digging deep and comparing to the original, I can't find the source, as they are essentially similar with trivial changes in file names and destination directories only.
EDIT_01:
Debugging step by step, it seems the line that actually throws the error is #106 in the original cifar10_eval.py:
def eval_once(args etc)
...
with tf.Session() as sess:
...
summary = tf.Summary()
summary.ParseFromString(sess.run(summary_op)) # <========== line 106
summary_op is created in def evaluate of this same script and passed as an arg to def eval_once.
summary_op = tf.merge_all_summaries()
...
while True:
eval_once(saver, summary_writer, top_k_op, summary_op)
From documentation on Session, a session can be closed with .close command or when using it through a context-manager in with block. I did find tensorflow/models/image/cifar10 | xargs grep "sess" and I don't see any sess.close, so it must be the later.
IE, you'll get this error if you do something like this
with tf.Session() as sess:
sess.run(..)
sess.run(...) # Attempted to use a closed Session.
It was a simple (but humbling) error in indentation.
summary = tf.Summary()
summary.ParseFromString(sess.run(summary_op))
summary.value.add(tag='Precision # 1', simple_value=precision)
summary_writer.add_summary(summary, global_step)
was outside of the try: block, and of course, no session could be found.
Sigh.

TensorFlow: Node not found

I am training a neural network and have been running this code without any problems but sometimes (twice) I get an error Not Found: FetchOutputs node not found at the line y_1 = sess.run(get_labels(step)) (See below).
get_labels(step) is a function to return the correct labels of my training images which is in a text file.
def get_labels(step):
with open('labels.txt','r') as fin:
reader = csv.reader(fin)
c = [[int(s) for s in row] for i,row in enumerate(reader) if i==step]
label_numbers = np.array(c)
# Convert to one-hot vectors
numpy_label = np.zeros((BATCH_SIZE,5))
for i in range(BATCH_SIZE):
numpy_label[i,label_numbers[0][i]-1] = 1
# Convert to tensor
y_label = tf.convert_to_tensor(numpy_label,dtype=tf.float32)
return y_label
This is my main function:
def main():
# Placeholder for correct labels
y_label = tf.placeholder(tf.float32,shape=[BATCH_SIZE,5])
< Other functions etc. >
sess.run(tf.initialize_all_variables())
tf.train.start_queue_runners(sess=sess)
for step in range(1000):
# Get labels for current batch
y_1 = sess.run(get_labels(step))
# Train
sess.run([train_step],feed_dict={y_label:y_1})
< Other stuff like writing summaries, saving variables etc. >
sess.close()
From reading some of the issues on GitHub, I know this is to do with the fact that I call y_1 = sess.run(get_labels(step)) after tf.train.start_queue_runners(sess=sess) but I don't understand:
why it works most of the time, but occasionally doesn't?
Is y_1 = sess.run(get_labels(step)) adding or modifying nodes in the graph? I thought I was just running a node get_labels(step) that was already defined in the graph. I tried finalizing the graph before starting the queue runners but that gave me the error that finalized graphs cannot be modified.
What would be the proper way to write the code? Usually I just restart my program and it is fine - but clearly I am not doing it the proper way.
Thank you!
EDIT:
I think it might be important to mention that this happens when I am trying to run a TensorFlow script in a separate screen on a server i.e. I have one screen running a TensorFlow script and now I create a new screen to run a different TensorFlow script. I just started using screens so I might be missing something fundamental about how they work.

Categories

Resources