Tensorflow iterator fails to iterate

Tensorflow iterator fails to iterate - python

I am working on a project related to instance segmentation. I am trying to train a SegNet with my own image dataset which comprises a set of images and their corresponding masks, and I have successfully used tf.Dataset to load my data. But every time I use the feedable iterator to feed the dataset to SegNet, my program is always terminated without any error or warning. My code is shown below.
load_satellite_image() is used to read filename for images and dataset() is used to load images with tf.Dataset. It seems that the iterator fails to update the input pipeline.
train_path = "data_example/train.txt"
val_path = "data_example/test.txt"
config_file = 'config.json'
with open(config_file) as f:
config = json.load(f)
train_img, train_mask = load_satellite_image(train_path)
val_img, val_mask = load_satellite_image(val_path)
train_dataset = dataset(train_img, train_mask, config, True, 0, 1)
val_dataset = dataset(val_img, val_mask, config, True, 0, 1)
train_iter = train_dataset.make_initializable_iterator()
validation_iter = val_dataset.make_initializable_iterator()
handle = tf.placeholder(tf.string, shape=[])
iterator = tf.data.Iterator.from_string_handle(handle,
train_dataset.output_types,train_dataset.output_shapes)
next_element = iterator.get_next()
with tf.Session() as Sess:
sess.run(train_iter.initializer)
sess.run(validation_iter.initializer)
train_iter_handle = sess.run(train_iter.string_handle())
val_iter_handle = sess.run(validation_iter.string_handle())
for i in range(2):
print("1")
try:
while True:
for i in range(5):
print(sess.run(next_element,feed_dict={handle:train_iter_handle}))
print('----------------------------','\n')
for i in range(2):
print(sess.run(next_element,feed_dict={handle:val_iter_handle}))
except tf.errors.OutOfRangeError:
pass
After running the code above, I got:
In [2]: runfile('D:/python_code/tensorflow_study/SegNet/load_data.py',
wdir='D:/python_code/tensorflow_study/SegNet')
(tf.float32, tf.int32)
(TensorShape([Dimension(360), Dimension(480), Dimension(3)]), TensorShape([Dimension(360),
Dimension(480), Dimension(1)]))
(tf.float32, tf.int32)
(TensorShape([Dimension(360), Dimension(480), Dimension(3)]), TensorShape([Dimension(360),
Dimension(480), Dimension(1)]))
WARNING:tensorflow:From D:\Anaconda\envs\tensorflow-gpu\lib\site-
packages\tensorflow\python\data\ops\dataset_ops.py:1419: colocate_with (from
tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
In [1]:
I am confused that my code is terminated without any reason. As you can see, I can get the shape and datatype of training/ validation images and masks, which means the problem has nothing to do with my dataset. However, the for loop in the tf.Session() is not executed and I cannot get the result of print("1"). The iterator is not executed by sess.run() as well. Anyone have met this problem before?
Thanks!!!

Problem solved. It's a stupid mistake that wastes me a lot of time.
The reason why my program is terminated without error message is that I am using stupid Spyder to write my code, and I don't know why it doesn't show the error message. Actually, there exists an error message produced by TensorFlow. By coincidence, I ran my code via the command window of Anaconda and I got this error message:
2020-04-30 17:31:03.591207: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at whole_file_read_ops.cc:114 : Invalid argument: NewRandomAccessFile failed to Create/Open: D:\Study\PhD\python_code\tensorflow_study\SegNet\data_example\trainannot\ges_517405_679839_21.jpg
The iterator doesn't work because Tensorflow cannot find mask locations. The image and mask locations are stored in a text file like this:
data_example\train\ges_517404_679750_21.jpg,data_example\trainannot\ges_517404_679750_21.jpg
data_example\train\ges_517411_679762_21.jpg,data_example\trainannot\ges_517411_679762_21.jpg
The left side is the locations of raw images and the right side is the locations of their masks. In the beginning, I used split(",") to get the location of images and masks separately, but it seems that there is something wrong with the locations of masks. So I checked the code that is used to generate the text file:
file.writelines([Train_path[i],',',TrainAnnot_path[i],'\n'])
Each line in the text file ends with \n, and this is why Tensorflow cannot get the location of the masks. So I replaced file.writelines([Train_path[i],',',TrainAnnot_path[i],'\n'])with file.writelines([Train_path[i],' ',TrainAnnot_path[i],'\n']), and used strip().split(" ") rather than split(" "). That solves the problem.

Related

Concatenating two saved models in tensorflow 1.13 [duplicate]

I've trained a DCGAN model and would now like to load it into a library that visualizes the drivers of neuron activation through image space optimization.
The following code works, but forces me to work with (1, width, height, channels) images when doing subsequent image analysis, which is a pain (the library assumptions about the shape of network input).
# creating TensorFlow session and loading the model
graph = tf.Graph()
sess = tf.InteractiveSession(graph=graph)
new_saver = tf.train.import_meta_graph(model_fn)
new_saver.restore(sess, './')
I'd like to change the input_map, After reading the source, I expected this code to work:
graph = tf.Graph()
sess = tf.InteractiveSession(graph=graph)
t_input = tf.placeholder(np.float32, name='images') # define the input tensor
t_preprocessed = tf.expand_dims(t_input, 0)
new_saver = tf.train.import_meta_graph(model_fn, input_map={'images': t_input})
new_saver.restore(sess, './')
But got an error:
ValueError: tf.import_graph_def() requires a non-empty name if input_map is used.
When the stack gets down to tf.import_graph_def() the name field is set to import_scope, so I tried the following:
graph = tf.Graph()
sess = tf.InteractiveSession(graph=graph)
t_input = tf.placeholder(np.float32, name='images') # define the input tensor
t_preprocessed = tf.expand_dims(t_input, 0)
new_saver = tf.train.import_meta_graph(model_fn, input_map={'images': t_input}, import_scope='import')
new_saver.restore(sess, './')
Which netted me the following KeyError:
KeyError: "The name 'gradients/discriminator/minibatch/map/while/TensorArrayWrite/TensorArrayWriteV3_grad/TensorArrayReadV3/RefEnter:0' refers to a Tensor which does not exist. The operation, 'gradients/discriminator/minibatch/map/while/TensorArrayWrite/TensorArrayWriteV3_grad/TensorArrayReadV3/RefEnter', does not exist in the graph."
If I set 'import_scope', I get the same error whether or not I set 'input_map'.
I'm not sure where to go from here.

In the newer version of tensorflow>=1.2.0, the following step works fine.
t_input = tf.placeholder(np.float32, shape=[None, width, height, channels], name='new_input') # define the input tensor
# here you need to give the name of the original model input placeholder name
# For example if the model has input as; input_original= tf.placeholder(tf.float32, shape=(1, width, height, channels, name='original_placeholder_name'))
new_saver = tf.train.import_meta_graph(/path/to/checkpoint_file.meta, input_map={'original_placeholder_name:0': t_input})
new_saver.restore(sess, '/path/to/checkpointfile')

So, the main issue is that you're not using the syntax right. Check the documentation for tf.import_graph_def for the use of input_map (link).
Let's breakdown this line:
new_saver = tf.train.import_meta_graph(model_fn, input_map={'images': t_input}, import_scope='import')
You didn't outline what model_fn is, but it needs to be a path to the file.
For the next part, in input_map, you're saying: replace the input in the original graph (DCGAN) whose name is images with my variable (in the current graph) called t_input. Problematically, t_input and images are referencing the same object in different ways as per this line:
t_input = tf.placeholder(np.float32, name='images')
In other words, images in input_map should actually be whatever the variable name is that you're trying to replace in the DCGAN graph. You'll have to import the graph in its base form (i.e., without the input_map line) and figure out what the name of the variable you want to link to is. It'll be in the list returned by tf.get_collection('variables') after you have imported the graph. Look for the dimensions (1, width, height, channels), but with the values in place of the variable names. If it's a placeholder, it'll look something like scope/Placeholder:0 where scope is replaced with whatever the variable's scope is.
Word of caution:
Tensorflow is very finicky about what it expects graphs to look like. So, if in the original graph specification the width, height, and channels are explicitly specified, then Tensorflow will complain (throw an error) when you try to connect a placeholder with a different set of dimensions. And, this makes sense. If the system was trained with some set of dimensions, then it only knows how to generate images with those dimensions.
In theory, you can still stick all kinds of weird stuff on the front of that network. But, you will need to scale it down so it meets those dimensions first (and the Tensorflow documentation says it's better to do that with the CPU outside of the graph; i.e., before inputing it with feed_dict).
Hope that helps!

How to test dataset iterators within jupyter notebook

I want to experiment with different functions to parse my csv and I am trying to use the tf.dataset iterators, yet I am having trouble getting this to work. My goal with the code below is to essentially print the first parsed row.
import tensorflow as tf
filenames = 'my_dataset.csv'
dataset = tf.data.TextLineDataset(filenames).skip(1).map(lambda row: parse_csv(row, hparams))
iterator = dataset.make_one_shot_iterator()
next_row = iterator.get_next()
with tf.Session() as sess:
#sess.run(iterator.initializer)
while True:
try:
print(sess.run(next_x))
except tf.errors.OutOfRangeError:
break
Now if I run this you will see that I get FailedPreconditionError (see above for traceback): GetNext() failed because the iterator has not been initialized. Ensure that you have run the initializer operation for this iterator before getting the next element. so then I proceed to uncomment the iterator.initializer and I get another error ValueError: Iterator does not have an initializer.
What changes need to be made to actually step through to see what is happening with my parse_csv call?

You run sess.run(next_x) instead of sess.run(next_row).

Why match_filenames_once function returns a local variable

I was trying to understand the mechanism of tensorflow for reading images using queues. I was using the code found here, whom basic parts are:
filename_queue = tf.train.string_input_producer(tf.train.match_filenames_once('D:/Dataset/*.jpg'))
image_reader = tf.WholeFileReader()
image_name, image_file = image_reader.read(filename_queue)
image = tf.image.decode_jpeg(image_file)
with tf.Session() as sess:
tf.global_variables_initializer().run()
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
image_tensor = sess.run([image])
print(image_tensor)
which in reality does nothing special. I was getting an error:
OutOfRangeError (see above for traceback): FIFOQueue
'_0_input_producer' is closed and has insufficient elements (requested
1, current size 0)
which lead to search for missing images, wrong folder, wrong glob pattern etc until I discovered that tensorflow basically meant this:
"You need to initialize local variables also"!
Besides the fact that the code seemed to work in the original gist with just this substitution:
tf.initialize_all_variables().run()
instead of
tf.global_variables_initializer().run()
in my code it does not work. It produces the same error. I guess it has changed the implementation of initialize_all_variables() with tensorflow development (I am using 1.3.0), since in here it mentions that it initialize local variables also.
So, the final conclusion I came with was that I should initialize local variables also. And my code worked. The error message is awfully misleading (which did not help at all) but anyway to the main part I am a bit confused why am I getting a local variable by match_filenames_once. In documentation there is no reference about this (I am not sure it should though).
Am I always going to get local from this match_filenames_once? Can I control it somehow?

MxNet: label_shapes don't match names specified by label_names

I wrote a script to do the classification of a single input image using a model I trained with MxNet. To classify the incoming image I feedforward them in through network.
In short here is what I am doing:
symbol, arg_params, aux_params = mx.model.load_checkpoint('model-prefix', 42)
model = mx.mod.Module(symbol=symbol, context=mx.cpu())
model.bind(data_shapes=[('data', (1, 3, 224, 244))], for_training=False)
model.set_params(arg_params, aux_params)
# ... loading the image & resizing ...
# img is the image to classify as numpy array of shape (3, 244, 244)
Batch = namedtuple('Batch', ['data'])
self._model.forward(Batch(data=[mx.nd.array(img)]))
probabilities = self._model.get_outputs()[0].asnumpy()
print(str(probabilities))
This works fine, except that I am getting the following warning
UserWarning: Data provided by label_shapes don't match names specified by label_names ([] vs. ['softmax_label'])
What should I change to avoid getting this warning? It is not clear to me what the label_shapes and label_names parameters are meant for, and what I am expect to fill them with.
Note: I found some thread about them, but none enabled me to solve the problem. Similarly the MxNet documentation doesn't provide much details on what those parameters are and on how they are supposed to be filled.

Set label_names=None and allow_missing=True. That should get rid of the warning.
model = mx.mod.Module(symbol=symbol, context=mx.cpu(), label_names=None)
...
model.set_params(arg_params, aux_params, allow_missing=True)
If you are curious why the warning is printed in the first place,
Every module has associated label. When this model was trained, softmax_label was used as the label (most likely because the output layer was a softmax layer named 'softmax'). When the model was loaded from file, the module that was created had softmax_label as the module's label.
>>>print(model.label_names)
['softmax_label']
model.bind is then called without providing label_shapes.
model.bind(data_shapes=[('data', (1, 3, 224, 244))], for_training=False)
MXNet sees that the module has a label in it which was not provided during bind and complains about it - which is the warning message you see.
I think if bind is called with for_training=False, MXNet shouldn't complain about the missing label. I've created this issue: https://github.com/dmlc/mxnet/issues/6958
However, for this particular case where we load a model from disk, we can load it with None as the label so that MXNet doesn't later complain when bind doesn't provide label - which is what the suggested fix does.

TensorFlow: Node not found

I am training a neural network and have been running this code without any problems but sometimes (twice) I get an error Not Found: FetchOutputs node not found at the line y_1 = sess.run(get_labels(step)) (See below).
get_labels(step) is a function to return the correct labels of my training images which is in a text file.
def get_labels(step):
with open('labels.txt','r') as fin:
reader = csv.reader(fin)
c = [[int(s) for s in row] for i,row in enumerate(reader) if i==step]
label_numbers = np.array(c)
# Convert to one-hot vectors
numpy_label = np.zeros((BATCH_SIZE,5))
for i in range(BATCH_SIZE):
numpy_label[i,label_numbers[0][i]-1] = 1
# Convert to tensor
y_label = tf.convert_to_tensor(numpy_label,dtype=tf.float32)
return y_label
This is my main function:
def main():
# Placeholder for correct labels
y_label = tf.placeholder(tf.float32,shape=[BATCH_SIZE,5])
< Other functions etc. >
sess.run(tf.initialize_all_variables())
tf.train.start_queue_runners(sess=sess)
for step in range(1000):
# Get labels for current batch
y_1 = sess.run(get_labels(step))
# Train
sess.run([train_step],feed_dict={y_label:y_1})
< Other stuff like writing summaries, saving variables etc. >
sess.close()
From reading some of the issues on GitHub, I know this is to do with the fact that I call y_1 = sess.run(get_labels(step)) after tf.train.start_queue_runners(sess=sess) but I don't understand:
why it works most of the time, but occasionally doesn't?
Is y_1 = sess.run(get_labels(step)) adding or modifying nodes in the graph? I thought I was just running a node get_labels(step) that was already defined in the graph. I tried finalizing the graph before starting the queue runners but that gave me the error that finalized graphs cannot be modified.
What would be the proper way to write the code? Usually I just restart my program and it is fine - but clearly I am not doing it the proper way.
Thank you!
EDIT:
I think it might be important to mention that this happens when I am trying to run a TensorFlow script in a separate screen on a server i.e. I have one screen running a TensorFlow script and now I create a new screen to run a different TensorFlow script. I just started using screens so I might be missing something fundamental about how they work.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.