TensorFlow: Node not found - python

I am training a neural network and have been running this code without any problems but sometimes (twice) I get an error Not Found: FetchOutputs node not found at the line y_1 = sess.run(get_labels(step)) (See below).
get_labels(step) is a function to return the correct labels of my training images which is in a text file.
def get_labels(step):
with open('labels.txt','r') as fin:
reader = csv.reader(fin)
c = [[int(s) for s in row] for i,row in enumerate(reader) if i==step]
label_numbers = np.array(c)
# Convert to one-hot vectors
numpy_label = np.zeros((BATCH_SIZE,5))
for i in range(BATCH_SIZE):
numpy_label[i,label_numbers[0][i]-1] = 1
# Convert to tensor
y_label = tf.convert_to_tensor(numpy_label,dtype=tf.float32)
return y_label
This is my main function:
def main():
# Placeholder for correct labels
y_label = tf.placeholder(tf.float32,shape=[BATCH_SIZE,5])
< Other functions etc. >
for step in range(1000):
# Get labels for current batch
y_1 = sess.run(get_labels(step))
# Train
< Other stuff like writing summaries, saving variables etc. >
From reading some of the issues on GitHub, I know this is to do with the fact that I call y_1 = sess.run(get_labels(step)) after tf.train.start_queue_runners(sess=sess) but I don't understand:
why it works most of the time, but occasionally doesn't?
Is y_1 = sess.run(get_labels(step)) adding or modifying nodes in the graph? I thought I was just running a node get_labels(step) that was already defined in the graph. I tried finalizing the graph before starting the queue runners but that gave me the error that finalized graphs cannot be modified.
What would be the proper way to write the code? Usually I just restart my program and it is fine - but clearly I am not doing it the proper way.
Thank you!
I think it might be important to mention that this happens when I am trying to run a TensorFlow script in a separate screen on a server i.e. I have one screen running a TensorFlow script and now I create a new screen to run a different TensorFlow script. I just started using screens so I might be missing something fundamental about how they work.


Python terminates when working with Memory layers in Python GDAL/OGR

I've been working on a software project that processes geometric files and combines them with cadastral data. I managed to find solutions for the various subtasks needed to accomplish the main task.
Recently I've been trying to streamline the code and to put together the various code snippets I have written so far. This is when I ran into issues with the GDAL/OGR Memory driver. The code worked fine with other drivers (WFS, DXF and ESRI Shapefile), but not really with the Memory driver
Read a Shapefile/DXF file, pass the layer retrieved from the file to prepare_layer and return the resulting layer as a Memory layer. Up until the return statement everything works fine. After the layer "is returned" the code terminates shortly after without throwing an exception. This error can be circumvented by moving the code from the prepare_layer function to the main function. But anyway, it's not ideal and I would like to understand the reason why this happens.
from osgeo import ogr
def prepare_layer(input_layer):
mem_driver = ogr.GetDriverByName('MEMORY')
mem_ds = mem_driver.CreateDataSource('memData')
mem_open = mem_driver.Open('memData',1)
mem_layer = ogr.DataSource.CreateLayer(mem_ds, 'Layer1')
input_layer_defn = input_layer.GetLayerDefn()
for i in range(input_layer_defn.GetFieldCount()):
field_defn = input_layer_defn.GetFieldDefn(i)
mem_defn = mem_layer.GetLayerDefn()
for feature in input_layer:
geom = feature.GetGeometryRef()
geom_type = geom.GetGeometryName()
if geom_type == 'LINESTRING':
geom = geom.Buffer(2)
out_feature = ogr.Feature(mem_defn)
for i in range(feature.GetFieldCount()):
field_attr = feature.GetField(i)
out_feature.SetField(i, field_attr)
return mem_layer
input_driver = ogr.GetDriverByName('ESRI Shapefile')
input_open = input_driver.Open(file_path_in, 0)
input_layer = input_open.GetLayer()
output_layer = prepare_layer(input_layer)
Retrieve WFS data, extract the layer and intersect the layer with the processed memory layer from 1). Result: The terminal throws several HTTP 400 server errors. If I comment out the memory layer declaration under "# Result layer" the data is retrieved without issues. So regardless of the error message this is not a server issue. Further, if I manually paste the exact same HTTP request that get_wfs() produces into my browser I can download the data without problems.
I left out the code for these functions for brevity since they work smoothly with other drivers:
get_bbox(): returns a string with the bounding box coordinates
get_wfs(feature, box): returns a ogr.Driver.Open object of a WFS driver
# Result Layer
res_driver = ogr.GetDriverByName('MEMORY')
res_ds = res_driver.CreateDataSource('result')
res_open = res_driver.Open('result',1)
res_layer = res_ds.CreateLayer('result', target, ogr.wkbPolygon)
# WFS Flurstück
bbox = get_bbox(mem_layer)
FlSt = get_wfs(feature, bbox)
FlSt_layer = FlSt.GetLayer()
# Intersection
ogr.Layer.Intersection(mem_layer, FlSt_layer, res_layer)
I would appreciate any help or pointers in the right direction. I hope this wasn't too convoluted.

Tensorflow iterator fails to iterate

I am working on a project related to instance segmentation. I am trying to train a SegNet with my own image dataset which comprises a set of images and their corresponding masks, and I have successfully used tf.Dataset to load my data. But every time I use the feedable iterator to feed the dataset to SegNet, my program is always terminated without any error or warning. My code is shown below.
load_satellite_image() is used to read filename for images and dataset() is used to load images with tf.Dataset. It seems that the iterator fails to update the input pipeline.
train_path = "data_example/train.txt"
val_path = "data_example/test.txt"
config_file = 'config.json'
with open(config_file) as f:
config = json.load(f)
train_img, train_mask = load_satellite_image(train_path)
val_img, val_mask = load_satellite_image(val_path)
train_dataset = dataset(train_img, train_mask, config, True, 0, 1)
val_dataset = dataset(val_img, val_mask, config, True, 0, 1)
train_iter = train_dataset.make_initializable_iterator()
validation_iter = val_dataset.make_initializable_iterator()
handle = tf.placeholder(tf.string, shape=[])
iterator = tf.data.Iterator.from_string_handle(handle,
next_element = iterator.get_next()
with tf.Session() as Sess:
train_iter_handle = sess.run(train_iter.string_handle())
val_iter_handle = sess.run(validation_iter.string_handle())
for i in range(2):
while True:
for i in range(5):
for i in range(2):
except tf.errors.OutOfRangeError:
After running the code above, I got:
In [2]: runfile('D:/python_code/tensorflow_study/SegNet/load_data.py',
(tf.float32, tf.int32)
(TensorShape([Dimension(360), Dimension(480), Dimension(3)]), TensorShape([Dimension(360),
Dimension(480), Dimension(1)]))
(tf.float32, tf.int32)
(TensorShape([Dimension(360), Dimension(480), Dimension(3)]), TensorShape([Dimension(360),
Dimension(480), Dimension(1)]))
WARNING:tensorflow:From D:\Anaconda\envs\tensorflow-gpu\lib\site-
packages\tensorflow\python\data\ops\dataset_ops.py:1419: colocate_with (from
tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
In [1]:
I am confused that my code is terminated without any reason. As you can see, I can get the shape and datatype of training/ validation images and masks, which means the problem has nothing to do with my dataset. However, the for loop in the tf.Session() is not executed and I cannot get the result of print("1"). The iterator is not executed by sess.run() as well. Anyone have met this problem before?
Problem solved. It's a stupid mistake that wastes me a lot of time.
The reason why my program is terminated without error message is that I am using stupid Spyder to write my code, and I don't know why it doesn't show the error message. Actually, there exists an error message produced by TensorFlow. By coincidence, I ran my code via the command window of Anaconda and I got this error message:
2020-04-30 17:31:03.591207: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at whole_file_read_ops.cc:114 : Invalid argument: NewRandomAccessFile failed to Create/Open: D:\Study\PhD\python_code\tensorflow_study\SegNet\data_example\trainannot\ges_517405_679839_21.jpg
The iterator doesn't work because Tensorflow cannot find mask locations. The image and mask locations are stored in a text file like this:
The left side is the locations of raw images and the right side is the locations of their masks. In the beginning, I used split(",") to get the location of images and masks separately, but it seems that there is something wrong with the locations of masks. So I checked the code that is used to generate the text file:
Each line in the text file ends with \n, and this is why Tensorflow cannot get the location of the masks. So I replaced file.writelines([Train_path[i],',',TrainAnnot_path[i],'\n'])with file.writelines([Train_path[i],' ',TrainAnnot_path[i],'\n']), and used strip().split(" ") rather than split(" "). That solves the problem.

Debugging Tensorflow hang on global variables initialisation

I'm after advice on how to debug what on Tensorflow is struggling with when it hangs.
I have a multi layer CNN which hangs upon global_variables_initializer() is run in the session. I am getting no errors or messages on the console output.
Is there an intelligent way of debugging what Tensorflow is struggling with when it hangs instead of repeatedly commenting out lines of code that makes the graph, and re-running to see where it hangs. Would TensorFlow debugger (tfdbg) help? What options do I have?
Ideally it would be great to just to break current execution and look at some stack or similar to see where the execution is hanging during the init.
I'm currently running Tensorflow 0.12.1 with Python 3 inside a Jupiter notebook.
I managed to solve the problem. The tip from #amo-ej1 to run in a regular file was a step in the correct direction. This uncovered that the tensor flow process was killing itself off with a SIGKILL and returning an error code of 137.
I tried Tensorflow Debugger tfdbg though this did not provide any further details as the problem was the graph did not initialize. I started to think the graph structure was incorrect, so I dumped out the graph structure using:
tf.summary.FileWriter('./logs/traing_graph', graph)
I then used up Tensorboard to inspect the resultant summary graph structure data dumped out the the directory and found that the tensor dimensions of the Fully Connected layer was wrong , having a width of 15million !!?! (wrong)
It turned out that one of the configurable parameters of the graph was incorrect. It was picking the dimension of the layer 2 tensor shape incorrectly from an incorrect addressing the previous tf.shape type property and it exploded the dimensions of the graph.
There were no OOM error messages in /var/log/system.log so I am unsure why the graph initialisation caused the python tensorflow script process to die.
I fixed the dimensions of the graph and graph initialization worked just fine!
My top tip is visualise your graph with Tensorboard before initialisation and training to do a quick check the resultant graph structure you coded it what you expected it to be. You probably will save yourself a lot of time! :-)
A common methodology to debug tensorflow is to replace the placeholders and/or variables with numpy arrays and put them inside tf.const. When you do so you can actually examine the logic of your code by setting a breakpoints and to see numbers in "pythoninc" and not just tensors. It will be much easier to help you if you would post your code here, but here is a dummy example:
with tf.name_scope('scope_name'):
### This block is for debug only
import numpy as np
batch_size = 20
sess = tf.Session()
init_op = tf.global_variables_initializer()
### End of first debug block
## Replacing Placeholders for debug - uncomment the placehlolders and comment the numpy arrays to producation mode
const_a = tf.constant((np.random.rand(batch_size, 26) > 0.85).astype(int), dtype=tf.float32)
const_b = tf.constant(np.random.randint(0, 20, batch_size * 26).reshape((batch_size, 26)), dtype=tf.float32)
# real_a_placeholder = tf.log(input_placeholder_dict[A_DATA])
# real_b_placeholder = tf.log(input_placeholder_dict[B_DATA])
# dummy opreation
c = a - b
# selecting top k - in the sanity check you can see here that you actullay get the top items and top values
top_k = 5
top_k_values, top_k_indices = tf.nn.top_k(c,
k=top_k, sorted=True,
## Replacing Variable for debug - uncomment the variables and comment the numpy arrays to producation mode
Now, run your code with breakpoints and you have 2 options to see the values in the debugger:
2.you can use eval - varaible_name.eval(sessnio=sess)

How to use tf.cond in combination with batching operations / queue runners

I want to train a specific network architecture (a GAN) that needs inputs from different sources during training.
One input source is examples loaded from disk. The other source is a generator sub-network creating examples.
To choose which kind of input to feed to the network I use tf.cond. There is one caveat though that has already been explained: tf.cond evaluates the inputs to both conditional branches even though only one of those will ultimately be used.
Enough setup, here is a minimal working example:
import numpy as np
import tensorflow as tf
def load_input_data():
# Normally this data would be read from disk
data = tf.reshape(np.arange(10 * BATCH_SIZE, dtype=np.float32), shape=(10 * BATCH_SIZE, 1))
return tf.train.batch([data], BATCH_SIZE, enqueue_many=True)
def generate_input_data():
# Normally this data would be generated by a much bigger sub-network
return tf.random_uniform(shape=[BATCH_SIZE, 1])
def main():
# A bool to choose between loaded or generated inputs
load_inputs_pred = tf.placeholder(dtype=tf.bool, shape=[])
# Variant 1: Call "load_input_data" inside tf.cond
data_batch = tf.cond(load_inputs_pred, load_input_data, generate_input_data)
# Variant 2: Call "load_input_data" outside tf.cond
#loaded_data = load_input_data()
#data_batch = tf.cond(load_inputs_pred, lambda: loaded_data, generate_input_data)
init_op = tf.initialize_all_variables()
with tf.Session() as sess:
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
# Get generated input data
data_batch_values = sess.run(data_batch, feed_dict={load_inputs_pred: False})
# Get input data loaded from disk
data_batch_values = sess.run(data_batch, feed_dict={load_inputs_pred: True})
if __name__ == '__main__':
Variant 1 does not work at all since the queue runner threads don't seem to run. print(threads) outputs something like [<Thread(Thread-1, stopped daemon 140165838264064)>, ...].
Variant 2 does work and print(threads) outputs something like [<Thread(Thread-1, started daemon 140361854863104)>, ...]. But since load_input_data() has been called outside of tf.cond, batches of data will be loaded from disk even when load_inputs_pred is False.
Is it possible to make Variant 1 work, so that input data is only loaded when load_inputs_pred is True and not for every call to session.run()?
If you're using a queue when loading your data and follow it up with a batch input then this shouldn't be a problem as you can specify the max amount to have loaded or stored in the queue.
input = tf.WholeFileReader(somefilelist) # or another way to load data
return tf.train.batch(input,batch_size=10,capacity=100)
See here for more details:
Also there's an alternative approach that skips the tf.cond completely. Just define two losses one that follows the data through the autoencoder and discrimator and the other that follows the data through just the discriminator.
Then it just becomes a matter of calling
In this way the graph will only run through which ever loss was called upon. Let me know if this needs more explanation.
Lastly I think to make variant one work you need to do something like this if you're using preloaded data.
Otherwise I'm not sure what the issue is to be honest.

How to create a Tensorflow Tensorboard Empty Graph

launch tensorboard with tensorboard --logdir=/home/vagrant/notebook
at tensorboard:6006 > graph, it says No graph definition files were found.
To store a graph, create a tf.python.training.summary_io.SummaryWriter and pass the graph either via the constructor, or by calling its add_graph() method.
import tensorflow as tf
sess = tf.Session()
writer = tf.python.training.summary_io.SummaryWriter("/home/vagrant/notebook", sess.graph_def)
However the page is still empty, how can I start playing with tensorboard?
current tensorboard
result wanted
An empty graph that can add nodes, editable.
Seems like tensorboard is unable to create a graph to add nodes, drag and edit etc ( I am confused by the official video ).
running https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/tutorials/mnist/fully_connected_feed.py and then tensorboard --logdir=/home/vagrant/notebook/data is able to view the graph
However seems like tensorflow only provide ability to view summary, nothing much different to make it standout
TensorBoard is a tool for visualizing the TensorFlow graph and analyzing recorded metrics during training and inference. The graph is created using the Python API, then written out using the tf.train.SummaryWriter.add_graph() method. When you load the file written by the SummaryWriter into TensorBoard, you can see the graph that was saved, and interactively explore it.
However, TensorBoard is not a tool for building the graph itself. It does not have any support for adding nodes to the graph.
Starting from the following Code Example, I can add one line as shown below:
import tensorflow as tf
import numpy as np
sess = tf.InteractiveSession() #define a session
# Create 100 phony x, y data points in NumPy, y = x * 0.1 + 0.3
x_data = np.random.rand(100).astype("float32")
y_data = x_data * 0.1 + 0.3
# Try to find values for W and b that compute y_data = W * x_data + b
# (We know that W should be 0.1 and b 0.3, but Tensorflow will
# figure that out for us.)
W = tf.Variable(tf.random_uniform([1], -1.0, 1.0))
b = tf.Variable(tf.zeros([1]))
y = W * x_data + b
# Minimize the mean squared errors.
loss = tf.reduce_mean(tf.square(y - y_data))
optimizer = tf.train.GradientDescentOptimizer(0.5)
train = optimizer.minimize(loss)
# Before starting, initialize the variables. We will 'run' this first.
init = tf.initialize_all_variables()
# Launch the graph.
sess = tf.Session()
#### ----> ADD THIS LINE <---- ####
writer = tf.train.SummaryWriter("/tmp/test", sess.graph)
# Fit the line.
for step in xrange(201):
if step % 20 == 0:
print(step, sess.run(W), sess.run(b))
# Learns best fit is W: [0.1], b: [0.3]
And then run tensorboard from the command line, pointing to the appropriate directory. This shows a complete call for the SummaryWriter. It is important to note the following things:
SummaryWriter is passed a Session, and so must happen after the Session (or InteractiveSession) is created
That Session may be created early in the program, but when the Session is passed to the SummaryWriter, the graph as it exists at that point is written to the file that the TensorBoard will use.
In this page, there is a very simple code that you can use to test your installation: http://tensorflow.org/get_started
I included this line
tf.train.write_graph(sess.graph_def, '/home/daniel/Documents/Projetos/Prorum/ProgramasEmPython/TestingTensorFlow/fileGraph', 'graph.pbtxt')
After this "sess.run(init)"
This will generate a file that you have to upload to the "TensorBoard".
In order to open the TensorBoard, supposing that it is installed in your computer (it must be if you use pip to install), I used the terminal of Ubuntu and wrote:
"tensorboard --logdir nameOfDirectory"
Then, you should open your browser in Port 6006:
This will open the TensorBoard. I went to the "Graph Menu" and uploaded the file. It generated this figure below:
So, what I have done is to transfer the model I created in Python to TensorBoard. I believe that it is possible to create an empty one, if no model is created (only the session is initiated). However, I am not sure if you are able to change this directly in the TensorBoard.
I have answered before this question here in Portuguese with more details for Brazilian users. Maybe it can be useful for other people: http://prorum.com/index.php/1843/recentemente-plataforma-aprendizagem-primeira-impressao
i solved by on windows:
file_writer = tf.summary.FileWriter("output", sess.graph)
for that directory "output". I opened command on windows.
tensorboard --logdir="C:\Users\kiran\machine Learning\output"
my mistake was on that line..
The graphs in TensorBoard do not show up if you are using Firefox. You have to install Chrome.
result wanted
An empty graph that can add nodes, editable.
I think you will find the Orange tool useful. It allows you to drag and drop various nodes and implement algorithms via GUI.
I had to use
python -m tensorflow.tensorboard --logdir="C:\tmp\tensorflow\.."
somehow tensorboard --logdir didn't work.
My environment
OS: Windows 7, Python 3.5, and Tensorflow 1.1.0

