I am trying to use below function to crop large number of images 100,000s. I am doing this operation serially, but its taking lot of time. What is the efficient way to do this?
tf.image.crop_to_bounding_box
Below is my code:
def crop_images(img_dir, list_images):
outlist=[]
with tf.Session() as session:
for image1 in list_images[:5]:
image = mpimg.imread(img_dir+image1)
x = tf.Variable(image, name='x')
data_t = tf.placeholder(tf.uint8)
op = tf.image.encode_jpeg(data_t, format='rgb')
model = tf.global_variables_initializer()
img_name = "img/"+image1.split("_img_0")[0] + "/img_0"+image1.split("_img_0")[1]
height = x.shape[1]
[x1,y1,x2,y2] = img_bbox_dict[img_name]
x = tf.image.crop_to_bounding_box(x, int(y1), int(x1), int(y2)-int(y1), int(x2)-int(x1))
session.run(model)
result = session.run(x)
data_np = session.run(op, feed_dict={ data_t: result })
with open(img_path+image1, 'w+') as fd:
fd.write(data_np)
I'll give a simplified version of one of the examples from Tensorflow's Programmer's guide on reading data which can be found here. Basically, it uses Reader and Filename Queues to batch together image data using a specified number of threads. These threads are coordinated using what is called a thread Coordinator.
import tensorflow as tf
import glob
images_path = "./" #RELATIVE glob pathname of current directory
images_extension = "*.png"
# Save the list of files matching pattern, so it is only computed once.
filenames = tf.train.match_filenames_once(glob.glob(images_path+images_extension))
batch_size = len(glob.glob1(images_path,images_extension))
num_epochs=1
standard_size = [500, 500]
num_channels = 3
min_after_dequeue = 10
num_preprocess_threads = 3
seed = 14131
"""
IMPORTANT: Cropping params. These are arbitrary values used only for this example.
You will have to change them according to your requirements.
"""
crop_size=[200,200]
boxes = [1,1,460,460]
"""
'WholeFileReader' is a Reader who's 'read' method outputs the next
key-value pair of the filename and the contents of the file (the image) from
the Queue, both of which are string scalar Tensors.
Note that the The QueueRunner works in a thread separate from the
Reader that pulls filenames from the queue, so the shuffling and enqueuing
process does not block the reader.
'resize_images' is used so that all images are resized to the same
size (Aspect ratios may change, so in that case use resize_image_with_crop_or_pad)
'set_shape' is used because the height and width dimensions of 'image' are
data dependent and cannot be computed without executing this operation. Without
this Op, the 'image' Tensor's shape will have None as Dimensions.
"""
def read_my_file_format(filename_queue, standard_size, num_channels):
image_reader = tf.WholeFileReader()
_, image_file = image_reader.read(filename_queue)
if "jpg" in images_extension:
image = tf.image.decode_jpeg(image_file)
elif "png" in images_extension:
image = tf.image.decode_png(image_file)
image = tf.image.resize_images(image, standard_size)
image.set_shape(standard_size+[num_channels])
print "Successfully read file!"
return image
"""
'string_input_producer' Enters matched filenames into a 'QueueRunner' FIFO Queue.
'shuffle_batch' creates batches by randomly shuffling tensors. The 'capacity'
argument controls the how long the prefetching is allowed to grow the queues.
'min_after_dequeue' defines how big a buffer we will randomly
sample from -- bigger means better shuffling but slower startup & more memory used.
'capacity' must be larger than 'min_after_dequeue' and the amount larger
determines the maximum we will prefetch.
Recommendation: min_after_dequeue + (num_threads + a small safety margin) * batch_size
"""
def input_pipeline(filenames, batch_size, num_epochs, standard_size, num_channels, min_after_dequeue, num_preprocess_threads, seed):
filename_queue = tf.train.string_input_producer(filenames, num_epochs=num_epochs, shuffle=True)
example = read_my_file_format(filename_queue, standard_size, num_channels)
capacity = min_after_dequeue + 3 * batch_size
example_batch = tf.train.shuffle_batch([example], batch_size=batch_size, capacity=capacity, min_after_dequeue=min_after_dequeue, num_threads=num_preprocess_threads, seed=seed, enqueue_many=False)
print "Batching Successful!"
return example_batch
"""
Any transformation on the image batch goes here. Refer the documentation
for the details of how the cropping is done using this function.
"""
def crop_batch(image_batch, batch_size, b_boxes, crop_size):
cropped_images = tf.image.crop_and_resize(image_batch, boxes=[b_boxes for _ in xrange(batch_size)], box_ind=[i for i in xrange(batch_size)], crop_size=crop_size)
print "Cropping Successful!"
return cropped_images
example_batch = input_pipeline(filenames, batch_size, num_epochs, standard_size, num_channels, min_after_dequeue, num_preprocess_threads, seed)
cropped_images = crop_batch(example_batch, batch_size, boxes, crop_size)
"""
if 'num_epochs' is not `None`, the 'string_input_producer' function creates local
counter `epochs`. Use `local_variables_initializer()` to initialize local variables.
'Coordinator' class implements a simple mechanism to coordinate the termination
of a set of threads. Any of the threads can call `coord.request_stop()` to ask for all
the threads to stop. To cooperate with the requests, each thread must check for
`coord.should_stop()` on a regular basis.
`coord.should_stop()` returns True` as soon as `coord.request_stop()` has been called.
A thread can report an exception to the coordinator as part of the `should_stop()`
call. The exception will be re-raised from the `coord.join()` call.
After a thread has called `coord.request_stop()` the other threads have a
fixed time to stop, this is called the 'stop grace period' and defaults to 2 minutes.
If any of the threads is still alive after the grace period expires `coord.join()`
raises a RuntimeError reporting the laggards.
IMPORTANT: 'start_queue_runners' starts threads for all queue runners collected in
the graph, & returns the list of all threads. This must be executed BEFORE running
any other training/inference/operation steps, or it will hang forever.
"""
with tf.Session() as sess:
_, _ = sess.run([tf.global_variables_initializer(), tf.local_variables_initializer()])
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
try:
while not coord.should_stop():
# Run training steps or whatever
cropped_images1 = sess.run(cropped_images)
print cropped_images1.shape
except tf.errors.OutOfRangeError:
print('Load and Process done -- epoch limit reached')
finally:
# When done, ask the threads to stop.
coord.request_stop()
coord.join(threads)
sess.close()
Related
I have written a custom keras callback to check the augmented data from a generator. (See this answer for the full code.) However, when I tried to use the same callback for a tf.data.Dataset, it gave me an error:
File "/path/to/tensorflow_image_callback.py", line 16, in on_batch_end
imgs = self.train[batch][images_or_labels]
TypeError: 'PrefetchDataset' object is not subscriptable
Do keras callbacks in general only work with generators, or is it something about the way I've written my one? Is there a way to modify either my callback or the dataset to make it work?
I think there are three pieces to this puzzle. I'm open to changes to any and all of them. Firstly, the init function in the custom callback class:
class TensorBoardImage(tf.keras.callbacks.Callback):
def __init__(self, logdir, train, validation=None):
super(TensorBoardImage, self).__init__()
self.logdir = logdir
self.file_writer = tf.summary.create_file_writer(logdir)
self.train = train
self.validation = validation
Secondly, the on_batch_end function within that same class
def on_batch_end(self, batch, logs):
images_or_labels = 0 #0=images, 1=labels
imgs = self.train[batch][images_or_labels]
Thirdly, instantiating the callback
import tensorflow_image_callback
tensorboard_image_callback = tensorflow_image_callback.TensorBoardImage(logdir=tensorboard_log_dir, train=train_dataset, validation=valid_dataset)
model.fit(train_dataset,
epochs=n_epochs,
validation_data=valid_dataset,
callbacks=[
tensorboard_callback,
tensorboard_image_callback
])
Some related threads which haven't led me to an answer yet:
Accessing validation data within a custom callback
Create keras callback to save model predictions and targets for each batch during training
What ended up working for me was the following, using tfds:
the __init__ function:
def __init__(self, logdir, train, validation=None):
super(TensorBoardImage, self).__init__()
self.logdir = logdir
self.file_writer = tf.summary.create_file_writer(logdir)
# #from keras generator
# self.train = train
# self.validation = validation
#from tf.Data
my_data = tfds.as_numpy(train)
imgs = my_data['image']
then on_batch_end:
def on_batch_end(self, batch, logs):
images_or_labels = 0 #0=images, 1=labels
imgs = self.train[batch][images_or_labels]
#calculate epoch
n_batches_per_epoch = self.train.samples / self.train.batch_size
epoch = math.floor(self.train.total_batches_seen / n_batches_per_epoch)
#since the training data is shuffled each epoch, we need to use the index_array to find something which uniquely
#identifies the image and is constant throughout training
first_index_in_batch = batch * self.train.batch_size
last_index_in_batch = first_index_in_batch + self.train.batch_size
last_index_in_batch = min(last_index_in_batch, len(self.train.index_array))
img_indices = self.train.index_array[first_index_in_batch : last_index_in_batch]
with self.file_writer.as_default():
for ix,img in enumerate(imgs):
#only post 1 out of every 1000 images to tensorboard
if (img_indices[ix] % 1000) == 0:
#instead of img_filename, I could just use str(img_indices[ix]) as a unique identifier
#but this way makes it easier to find the unaugmented image
img_filename = self.train.filenames[img_indices[ix]]
#convert float to uint8, shift range to 0-255
img -= tf.reduce_min(img)
img *= 255 / tf.reduce_max(img)
img = tf.cast(img, tf.uint8)
img_tensor = tf.expand_dims(img, 0) #tf.summary needs a 4D tensor
tf.summary.image(img_filename, img_tensor, step=epoch)
I didn't need to make any changes to the instantiation.
I recommend only using it for debugging, otherwise it saves every nth image in your dataset to tensorboard every epoch. That can end up using a lot of disk space.
I'm using a simple method to extract descriptors from images and save them to disk into a .csv file. I have around 1M images and my network returns 512 features per image (float32).
Therefore, I estimate that at the end of the loop I would have 1e6 * 512 * 32/4 / 1e9 = 4.1GB. However, I observed that it is using more than twice the memory.
index is a string and class_id is a int64, so I don't think they are the culprit here.
I have already tried using gc.collect() without any success. Do you think my code is leaving references behind?
Here is the method:
def prepare_gallery(self, data_loader, TTA, pbar=False, dump_path=None):
'''Compute embeddings for a data_loader and store it in model.
This is required before predicting to a test set.
New entries should be removed from data before calling this function
to avoid inferring on useless images.
data_loader: A linear loader containing the database that test is
compared against.'''
self.set_mode('valid')
self.net.cuda()
n_iter = len(data_loader.dataset) / data_loader.batch_size
if pbar:
loader = tqdm(enumerate(data_loader), total=n_iter)
else:
loader = enumerate(data_loader)
# Run inference and get embeddings
feat_list = []
index_list = []
class_list = []
for i, (index, im, class_id) in loader:
with torch.no_grad():
feat = tta(self.net, im)
# Returns something like np.random.random((32, 512))
feat_list.extend(feat)
index_list.extend(index)
class_list.extend(class_id.item())
if dump_path is not None:
np.save(dump_path + '_ids', index_list)
np.save(dump_path + '_cls', class_list)
np.save(dump_path + '_feat', feat_list)
return np.asarray(index_list), np.asarray(feat_list), np.asarray(class_list)
Context: I am training a model using an Estimator. Without extraneous details, I am using queues to read in a series of input images, which are batched and manipulated using an input function which I am call "read_pics_batch":
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
keypoint_regression.fit(
input_fn = lambda: inp_mod.read_pics_batch(names_train, \
joint_annopoints_train,num_in_batch,max_num_epochs,'TRAIN'),
steps= max_steps_per_epoch*max_num_epochs, # max number of steps
monitors=[logging_hook])
coord.request_stop()
coord.join(threads)
The input function has the following form, where I am also randomising the input file order:
def read_pics_batch(names_list,joint_list,batch_size,max_num_epochs,task):
names_tensor = tf.convert_to_tensor(names_list, dtype=tf.string)
joint_total_tensor = tf.convert_to_tensor(joint_list, dtype=tf.int32)
min_after_dequeue = 100
capacity = min_after_dequeue + 3 * batch_size
file_pattern = [("...")]
examples = graph_io.read_keyed_batch_examples(file_pattern, batch_size, \
reader = tf.WholeFileReader, randomize_input = True, \
parse_fn = example_to_standard_pic, \
num_epochs = max_num_epochs, queue_capacity = capacity)
My questions are as follows:
1) Is there any way to restore the queue state from a checkpoint, like any other variable? If "randomize_input" from read_keyed_batch_examples would be set to False, than at each restart of the training_op I would read the same input files over and over, which is clearly not what I want.
2) If randomize_input = True, how exactly does the queue decide which files to enqueue? I see two possible options and I am unsure which is correct:
it selects a short-list of size "capacity" (from the full-list given by all the filenames defined by "file_pattern") and then randomises the order of the names in this short-list
it randomises the names in the full-list first, and then creates a short-list of size "capacity" out of this
If the second case applies, I don't believe that I would actually need to restore the queue state, since I would in principle read different files every time, but if the first case applies, I would still be reading the same few files over and over, just in a different order.
Thank you for your time!
I'd like to compute the mean of each of the RGB channels of a set of images in a multithreaded manner.
My idea was to have a string_input_producer that fills a filename_queue and then have a second FIFOQueue that is filled by num_threads threads that load images from the filenames in filename_queue, perform some ops on them and then enqueue the result.
This second queue is then accessed by one single thread (the main thread) that sums up all the values from the queue.
This is the code I have:
# variables for storing the mean and some intermediate results
mean = tf.Variable([0.0, 0.0, 0.0])
total = tf.Variable(0.0)
# the filename queue and the ops to read from it
filename_queue = tf.train.string_input_producer(filenames, num_epochs=1)
reader = tf.WholeFileReader()
_, value = reader.read(filename_queue)
image = tf.image.decode_jpeg(value, channels=3)
image = tf.cast(image, tf.float32)
sum = tf.reduce_sum(image, [0, 1])
num = tf.mul(tf.shape(image)[0], tf.shape(image)[1])
num = tf.cast(num, tf.float32)
# the second queue and its enqueue op
queue = tf.FIFOQueue(1000, dtypes=[tf.float32, tf.float32], shapes=[[3], []])
enqueue_op = queue.enqueue([sum, num])
# the ops performed by the main thread
img_sum, img_num = queue.dequeue()
mean_op = tf.add(mean, img_sum)
total_op = tf.add(total, img_num)
# adding new queue runner that performs enqueue_op on num_threads threads
qr = tf.train.QueueRunner(queue, [enqueue_op] * num_threads)
tf.train.add_queue_runner(qr)
init_op = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init_op)
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
# the main loop being executed until the OutOfRangeError
# (when filename_queue does not yield elements anymore)
try:
while not coord.should_stop():
mean, total = sess.run([mean_op, total_op])
except tf.errors.OutOfRangeError:
print 'All images processed.'
finally:
coord.request_stop()
coord.join(threads)
# some additional computations to get the mean
total_3channel = tf.pack([total, total, total])
mean = tf.div(mean, total_3channel)
mean = sess.run(mean)
print mean
The problem is each time I'm running this function I get different results, for example:
[ 99.35347748 58.35261154 44.56705856]
[ 95.91153717 92.54192352 87.48269653]
[ 124.991745 121.83417511 121.1891861 ]
I blame this to race conditions. But where do those race conditions come from? Can someone help me out?
Your QueueRunner will start num_threads threads which will race to access your reader and push the result onto the queue. The order of images on queue will vary depending on which thread finishes first.
Update Feb 12
A simple example of chaining two queues, and summing up values from the second queue. When using num_threads > 1, there's some non-determinism in the intermediate values, but the final value will always be 30. When num_threads=1, everything is deterministic
tf.reset_default_graph()
queue_dtype = np.int32
# values_queue is a queue that will be filled with 0,1,2,3,4
# range_input_producer creates the queue and registers its queue_runner
value_queue = tf.range_input_producer(limit=5, num_epochs=1, shuffle=False)
value = value_queue.dequeue()
# value_squared_queue will be filled with 0,1,4,9,16
value_squared_queue = tf.FIFOQueue(capacity=50, dtypes=queue_dtype)
value_squared_enqueue = value_squared_queue.enqueue(tf.square(value))
value_squared = value_squared_queue.dequeue()
# value_squared_sum keeps running sum of squares of values
value_squared_sum = tf.Variable(0)
value_squared_sum_update = value_squared_sum.assign_add(value_squared)
# register queue_runner in the global queue runners collection
num_threads = 2
qr = tf.train.QueueRunner(value_squared_queue, [value_squared_enqueue] * num_threads)
tf.train.queue_runner.add_queue_runner(qr)
sess = tf.InteractiveSession()
sess.run(tf.initialize_all_variables())
tf.start_queue_runners()
for i in range(5):
sess.run([value_squared_sum_update])
print sess.run([value_squared_sum])
You should see:
[0]
[1]
[5]
[14]
[30]
Or sometimes (when the order of first 2 values is flipped)
[1]
[1]
[5]
[14]
[30]
I am trying to prefetch training data to hide I/O latency. I would like to write custom Python code that loads data from disk and preprocesses the data (e.g. by adding a context window). In other words, one thread does data preprocessing and the other does training. Is this possible in TensorFlow?
Update: I have a working example based on #mrry's example.
import numpy as np
import tensorflow as tf
import threading
BATCH_SIZE = 5
TRAINING_ITERS = 4100
feature_input = tf.placeholder(tf.float32, shape=[128])
label_input = tf.placeholder(tf.float32, shape=[128])
q = tf.FIFOQueue(200, [tf.float32, tf.float32], shapes=[[128], [128]])
enqueue_op = q.enqueue([label_input, feature_input])
label_batch, feature_batch = q.dequeue_many(BATCH_SIZE)
c = tf.reshape(feature_batch, [BATCH_SIZE, 128]) + tf.reshape(label_batch, [BATCH_SIZE, 128])
sess = tf.Session()
def load_and_enqueue(sess, enqueue_op, coord):
with open('dummy_data/features.bin') as feature_file, open('dummy_data/labels.bin') as label_file:
while not coord.should_stop():
feature_array = np.fromfile(feature_file, np.float32, 128)
if feature_array.shape[0] == 0:
print('reach end of file, reset using seek(0,0)')
feature_file.seek(0,0)
label_file.seek(0,0)
continue
label_value = np.fromfile(label_file, np.float32, 128)
sess.run(enqueue_op, feed_dict={feature_input: feature_array,
label_input: label_value})
coord = tf.train.Coordinator()
t = threading.Thread(target=load_and_enqueue, args=(sess,enqueue_op, coord))
t.start()
for i in range(TRAINING_ITERS):
sum = sess.run(c)
print('train_iter='+str(i))
print(sum)
coord.request_stop()
coord.join([t])
This is a common use case, and most implementations use TensorFlow's queues to decouple the preprocessing code from the training code. There is a tutorial on how to use queues, but the main steps are as follows:
Define a queue, q, that will buffer the preprocessed data. TensorFlow supports the simple tf.FIFOQueue that produces elements in the order they were enqueued, and the more advanced tf.RandomShuffleQueue that produces elements in a random order. A queue element is a tuple of one or more tensors (which can have different types and shapes). All queues support single-element (enqueue, dequeue) and batch (enqueue_many, dequeue_many) operations, but to use the batch operations you must specify the shapes of each tensor in a queue element when constructing the queue.
Build a subgraph that enqueues preprocessed elements into the queue. One way to do this would be to define some tf.placeholder() ops for tensors corresponding to a single input example, then pass them to q.enqueue(). (If your preprocessing produces a batch at once, you should use q.enqueue_many() instead.) You might also include TensorFlow ops in this subgraph.
Build a subgraph that performs training. This will look like a regular TensorFlow graph, but will get its input by calling q.dequeue_many(BATCH_SIZE).
Start your session.
Create one or more threads that execute your preprocessing logic, then execute the enqueue op, feeding in the preprocessed data. You may find the tf.train.Coordinator and tf.train.QueueRunner utility classes useful for this.
Run your training graph (optimizer, etc.) as normal.
EDIT: Here's a simple load_and_enqueue() function and code fragment to get you started:
# Features are length-100 vectors of floats
feature_input = tf.placeholder(tf.float32, shape=[100])
# Labels are scalar integers.
label_input = tf.placeholder(tf.int32, shape=[])
# Alternatively, could do:
# feature_batch_input = tf.placeholder(tf.float32, shape=[None, 100])
# label_batch_input = tf.placeholder(tf.int32, shape=[None])
q = tf.FIFOQueue(100, [tf.float32, tf.int32], shapes=[[100], []])
enqueue_op = q.enqueue([feature_input, label_input])
# For batch input, do:
# enqueue_op = q.enqueue_many([feature_batch_input, label_batch_input])
feature_batch, label_batch = q.dequeue_many(BATCH_SIZE)
# Build rest of model taking label_batch, feature_batch as input.
# [...]
train_op = ...
sess = tf.Session()
def load_and_enqueue():
with open(...) as feature_file, open(...) as label_file:
while True:
feature_array = numpy.fromfile(feature_file, numpy.float32, 100)
if not feature_array:
return
label_value = numpy.fromfile(feature_file, numpy.int32, 1)[0]
sess.run(enqueue_op, feed_dict={feature_input: feature_array,
label_input: label_value})
# Start a thread to enqueue data asynchronously, and hide I/O latency.
t = threading.Thread(target=load_and_enqueue)
t.start()
for _ in range(TRAINING_EPOCHS):
sess.run(train_op)
In other words, one thread does data preprocessing and the other does training. Is this possible in TensorFlow?
Yes, it is. mrry's solution works, but simpler exists.
Fetching data
tf.py_func wraps a python function and uses it as a TensorFlow operator. So we can load the data at sess.run() each time. The problem with this approach is that data is loaded during sess.run() via the main thread.
A minimal example:
def get_numpy_tensor():
return np.array([[1,2],[3,4]], dtype=np.float32)
tensorflow_tensor = tf.py_func(get_numpy_tensor, [], tf.float32)
A more complex example:
def get_numpy_tensors():
# Load data from the disk into numpy arrays.
input = np.array([[1,2],[3,4]], dtype=np.float32)
target = np.int32(1)
return input, target
tensorflow_input, tensorflow_target = tf.py_func(get_numpy_tensors, [], [tf.float32, tf.int32])
tensorflow_input, tensorflow_target = 2*tensorflow_input, 2*tensorflow_target
sess = tf.InteractiveSession()
numpy_input, numpy_target = sess.run([tensorflow_input, tensorflow_target])
assert np.all(numpy_input==np.array([[2,4],[6,8]])) and numpy_target==2
Prefetching data in another thread
To queue our data in another thread (so that sess.run() won't have to wait for the data), we can use tf.train.batch() on our operators from tf.py_func().
A minimal example:
tensor_shape = get_numpy_tensor().shape
tensorflow_tensors = tf.train.batch([tensorflow_tensor], batch_size=32, shapes=[tensor_shape])
# Run `tf.train.start_queue_runners()` once session is created.
We can omit the argument shapes if tensorflow_tensor has its shape specified:
tensor_shape = get_numpy_tensor().shape
tensorflow_tensor.set_shape(tensor_shape)
tensorflow_tensors = tf.train.batch([tensorflow_tensor], batch_size=32)
# Run `tf.train.start_queue_runners()` once session is created.
A more complex example:
input_shape, target_shape = (2, 2), ()
def get_numpy_tensors():
input = np.random.rand(*input_shape).astype(np.float32)
target = np.random.randint(10, dtype=np.int32)
print('f', end='')
return input, target
tensorflow_input, tensorflow_target = tf.py_func(get_numpy_tensors, [], [tf.float32, tf.int32])
batch_size = 2
tensorflow_inputs, tensorflow_targets = tf.train.batch([tensorflow_input, tensorflow_target], batch_size, shapes=[input_shape, target_shape], capacity=2)
# Internal queue will contain at most `capasity=2` times `batch_size=2` elements `[tensorflow_input, tensorflow_target]`.
tensorflow_inputs, tensorflow_targets = 2*tensorflow_inputs, 2*tensorflow_targets
sess = tf.InteractiveSession()
tf.train.start_queue_runners() # Internally, `tf.train.batch` uses a QueueRunner, so we need to ask tf to start it.
for _ in range(10):
numpy_inputs, numpy_targets = sess.run([tensorflow_inputs, tensorflow_targets])
assert numpy_inputs.shape==(batch_size, *input_shape) and numpy_targets.shape==(batch_size, *target_shape)
print('r', end='')
# Prints `fffffrrffrfrffrffrffrffrffrffrf`.
In case get_numpy_tensor() returns a batch of tensors, then tf.train.batch(..., enqueue_many=True) will help.