Train model using queue Tensorflow

Train model using queue Tensorflow - python

I designed a neural network in tensorflow for my regression problem by following and adapting the tensorflow tutorial. However, due to the structure of my problem (~300.000 data points and use of the costful FTRLOptimizer), my problem took too long to execute even with my 32 CPUs machine (I don't have GPUs).
According to this comment and a quick confirmation via htop, it appears that I have some single-threaded operations and it should be feed_dict.
Therefore, as adviced here, I tried to use queues for multi-threading my program.
I wrote a simple code file with queue to train a model as following:
import numpy as np
import tensorflow as tf
import threading
#Function for enqueueing in parallel my data
def enqueue_thread():
sess.run(enqueue_op, feed_dict={x_batch_enqueue: x, y_batch_enqueue: y})
#Set the number of couples (x, y) I use for "training" my model
BATCH_SIZE = 5
#Generate my data where y=x+1+little_noise
x = np.random.randn(10, 1).astype('float32')
y = x+1+np.random.randn(10, 1)/100
#Create the variables for my model y = x*W+b, then W and b should both converge to 1.
W = tf.get_variable('W', shape=[1, 1], dtype='float32')
b = tf.get_variable('b', shape=[1, 1], dtype='float32')
#Prepare the placeholdeers for enqueueing
x_batch_enqueue = tf.placeholder(tf.float32, shape=[None, 1])
y_batch_enqueue = tf.placeholder(tf.float32, shape=[None, 1])
#Create the queue
q = tf.RandomShuffleQueue(capacity=2**20, min_after_dequeue=BATCH_SIZE, dtypes=[tf.float32, tf.float32], seed=12, shapes=[[1], [1]])
#Enqueue operation
enqueue_op = q.enqueue_many([x_batch_enqueue, y_batch_enqueue])
#Dequeue operation
x_batch, y_batch = q.dequeue_many(BATCH_SIZE)
#Prediction with linear model + bias
y_pred=tf.add(tf.mul(x_batch, W), b)
#MAE cost function
cost = tf.reduce_mean(tf.abs(y_batch-y_pred))
learning_rate = 1e-3
train_op = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
available_threads = 1024
#Feed the queue
for i in range(available_threads):
threading.Thread(target=enqueue_thread).start()
#Train the model
for step in range(1000):
_, cost_step = sess.run([train_op, cost])
print(cost_step)
Wf=sess.run(W)
bf=sess.run(b)
This code doesn't work because each time I call x_batch, one y_batch is also dequeued and vice versa. Then, I do not compare the features with the corresponding "result".
Is there an easy way to avoid this problem ?

My mistake, everything worked fine.
I was misled because I estimated at each step of the algorithm my performance on different batches and also because my model was too complicated for a dummy one (I should had something like y=W*x or y=x+b).
Then, when I tried to print in the console, I exucuted several times sess.run on different variables and got obviously non-consistent results.

Nonetheless your problem is solved, wanted to show you a small inefficiency in your code. When you created your RandomShuffleQueue you specified capacity=2**20. In all the queues capacity:
The upper bound on the number of elements that may be stored in this
queue.
The queue will try to put as many elements as possible in the queue till it will hit this limit. All these elements are eating your RAM. If each element consists of only 1byte, your queue will eat 1Mb of your data. If you will have 10Kb images in your queue you will eat 10Gb of RAM.
This is very wasteful, especially because you never need so many elements in the queue. All you need to make sure is that your queue is never empty. So find a reasonable capacity of the queue and do not use huge numbers.

Related

How to use properly Tensorflow Dataset with batch?

I am new to Tensorflow and deep learning, and I am struggling with the Dataset class. I tried a lot of things and I can’t find a good solution.
What I am trying
I have a large amount of images (500k+) to train my DNN with. This is a denoising autoencoder so I have a pair of each image. I am using the dataset class of TF to manage the data, but I think I use it really badly.
Here is how I load the filenames in a dataset:
class Data:
def __init__(self, in_path, out_path):
self.nb_images = 512
self.test_ratio = 0.2
self.batch_size = 8
# load filenames in input and outputs
inputs, outputs, self.nb_images = self._load_data_pair_paths(in_path, out_path, self.nb_images)
self.size_training = self.nb_images - int(self.nb_images * self.test_ratio)
self.size_test = int(self.nb_images * self.test_ratio)
# split arrays in training / validation
test_data_in, training_data_in = self._split_test_data(inputs, self.test_ratio)
test_data_out, training_data_out = self._split_test_data(outputs, self.test_ratio)
# transform array to tf.data.Dataset
self.train_dataset = tf.data.Dataset.from_tensor_slices((training_data_in, training_data_out))
self.test_dataset = tf.data.Dataset.from_tensor_slices((test_data_in, test_data_out))
I have a function to call at each epoch that will prepare the dataset. It shuffles the filenames, and transforms filenames to images and batch data.
def get_batched_data(self, seed, batch_size):
nb_batch = int(self.size_training / batch_size)
def img_to_tensor(path_in, path_out):
img_string_in = tf.read_file(path_in)
img_string_out = tf.read_file(path_out)
im_in = tf.image.decode_jpeg(img_string_in, channels=1)
im_out = tf.image.decode_jpeg(img_string_out, channels=1)
return im_in, im_out
t_datas = self.train_dataset.shuffle(self.size_training, seed=seed)
t_datas = t_datas.map(img_to_tensor)
t_datas = t_datas.batch(batch_size)
return t_datas
Now during the training, at each epoch we call the get_batched_data function, make an iterator, and run it for each batch, then feed the array to the optimizer operation.
for epoch in range(nb_epoch):
sess_iter_in = tf.Session()
sess_iter_out = tf.Session()
batched_train = data.get_batched_data(epoch)
iterator_train = batched_train.make_one_shot_iterator()
in_data, out_data = iterator_train.get_next()
total_batch = int(data.size_training / batch_size)
for batch in range(total_batch):
print(f"{batch + 1} / {total_batch}")
in_images = sess_iter_in.run(in_data).reshape((-1, 64, 64, 1))
out_images = sess_iter_out.run(out_data).reshape((-1, 64, 64, 1))
sess.run(optimizer, feed_dict={inputs: in_images,
outputs: out_images})
What do I need ?
I need to have a pipeline that loads only the images of the current batch (otherwise it will not fit in memory) and I want to shuffle the dataset in a different way for each epoch.
Questions and problems
First question, am I using the Dataset class in a good way? I saw very different things on the internet, for example in this blog post the dataset is used with a placeholder and fed during the learning with the datas. It seems strange because the data are all in an array, so loaded in memory. I don't see the point of using tf.data.dataset in this case.
I found solution by using repeat(epoch) on the dataset, like this, but the shuffle will not be different for each epoch in this case.
The second problem with my implementation is that I have an OutOfRangeError in some cases. With a small amount of data (512 like in the exemple) it works fine, but with a bigger amount of data, the error occurs. I thought it was because of a bad calculation of the number of batch due to bad rounding, or when the last batch has a smaller amount of data, but it happens in batch 32 out of 115... Is there any way to know the number of batch created after a batch(n) call on dataset?
Sorry for this loooonng question, but I've been struggling with this for a few days.

As far as I know, Official Performance Guideline is the best teaching material to make input pipelines.
I want to shuffle the dataset in a different way for each epoch.
Using shuffle() and repeat(), you can get different shuffle pattern for each epochs. You can confirm it with the following code
dataset = tf.data.Dataset.from_tensor_slices([1,2,3,4])
dataset = dataset.shuffle(4)
dataset = dataset.repeat(3)
iterator = dataset.make_one_shot_iterator()
x = iterator.get_next()
with tf.Session() as sess:
for i in range(10):
print(sess.run(x))
You can also use tf.contrib.data.shuffle_and_repeat as the mentioned by the above official page.
There are some problems in your code outside of creating data pipelines. You confuse graph construction with graph execution. You are repeating to create data input pipeline, so there are many redundant input pipelines as many as epochs. You can observe the redundant pipelines by Tensorboard.
You should place your graph construction code outside of loop as the following code (pseudo code)
batched_train = data.get_batched_data()
iterator = batched_train.make_initializable_iterator()
in_data, out_data = iterator_train.get_next()
for epoch in range(nb_epoch):
# reset iterator's state
sess.run(iterator.initializer)
try:
while True:
in_images = sess.run(in_data).reshape((-1, 64, 64, 1))
out_images = sess.run(out_data).reshape((-1, 64, 64, 1))
sess.run(optimizer, feed_dict={inputs: in_images,
outputs: out_images})
except tf.errors.OutOfRangeError:
pass
Moreover there are some unimportant inefficient code. You loaded a list of file path with from_tensor_slices(), so the list was embedded in your graph. (See https://www.tensorflow.org/guide/datasets#consuming_numpy_arrays for detail)
You would be better off using prefetch, and decreasing sess.run call by combining your graph.

Why is TensorFlow's `tf.data` package slowing down my code?

I'm just learning to use TensorFlow's tf.data API, and I've found that it is slowing my code down a lot, measured in time per epoch. This is the opposite of what it's supposed to do, I thought. I wrote a simple linear regression program to test it out.
Tl;Dr: With 100,000 training data, tf.data slows time per epoch down by about a factor of ten, if you're using full batch training. Worse if you use smaller batches. The opposite is true with 500 training data.
My question: What is going on? Is my implementation flawed? Other sources I've read have tf.data improving speeds by about 30%.
import tensorflow as tf
import numpy as np
import timeit
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
tf.logging.set_verbosity(tf.logging.ERROR)
n_epochs = 10
input_dimensions_list = [10]
def function_to_approximate(x):
return np.dot(x, random_covector).astype(np.float32) + np.float32(.01) * np.random.randn(1,1).astype(np.float32)
def regress_without_tfData(n_epochs, input_dimension, training_inputs, training_labels):
tf.reset_default_graph()
weights = tf.get_variable("weights", initializer=np.random.randn(input_dimension, 1).astype(np.float32))
X = tf.placeholder(tf.float32, shape=(None, input_dimension), name='X')
Y = tf.placeholder(tf.float32, shape=(None, 1), name='Y')
prediction = tf.matmul(X,weights)
loss = tf.reduce_mean(tf.square(tf.subtract(prediction, Y)))
loss_op = tf.train.AdamOptimizer(.01).minimize(loss)
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for _ in range(n_epochs):
sess.run(loss_op, feed_dict={X: training_inputs, Y:training_labels})
def regress_with_tfData(n_epochs, input_dimension, training_inputs, training_labels, batch_size):
tf.reset_default_graph()
weights = tf.get_variable("weights", initializer=np.random.randn(input_dimension, 1).astype(np.float32))
X,Y = data_set.make_one_shot_iterator().get_next()
prediction = tf.matmul(X, weights)
loss = tf.reduce_mean(tf.square(tf.subtract(prediction, Y)))
loss_op = tf.train.AdamOptimizer(.01).minimize(loss)
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
while True:
try:
sess.run(loss_op)
except tf.errors.OutOfRangeError:
break
for input_dimension in input_dimensions_list:
for data_size in [500, 100000]:
training_inputs = np.random.randn(data_size, input_dimension).astype(np.float32)
random_covector = np.random.randint(-5, 5, size=(input_dimension, 1))
training_labels = function_to_approximate(training_inputs)
print("Not using tf.data, with data size "
"{}, input dimension {} and training with "
"a full batch, it took an average of "
"{} seconds to run {} epochs.\n".
format(
data_size,
input_dimension,
timeit.timeit(
lambda: regress_without_tfData(
n_epochs, input_dimension,
training_inputs, training_labels
),
number=3
),
n_epochs))
for input_dimension in input_dimensions_list:
for data_size, batch_size in [(500, 50), (500, 500), (100000, 50), (100000, 100000)]:
training_inputs = np.random.randn(data_size, input_dimension).astype(np.float32)
random_covector = np.random.randint(-5, 5, size=(input_dimension, 1))
training_labels = function_to_approximate(training_inputs)
data_set = tf.data.Dataset.from_tensor_slices((training_inputs, training_labels))
data_set = data_set.repeat(n_epochs)
data_set = data_set.batch(batch_size)
print("Using tf.data, with data size "
"{}, and input dimension {}, and training with "
"batch size {}, it took an average of {} seconds "
"to run {} epochs.\n".
format(
data_size,
input_dimension,
batch_size,
timeit.timeit(
lambda: regress_with_tfData(
n_epochs, input_dimension,
training_inputs, training_labels,
batch_size
),
number=3
)/3,
n_epochs
))
This outputs for me:
Not using tf.data, with data size 500, input dimension 10 and training
with a full batch, it took an average of 0.20243382899980134 seconds
to run 10 epochs.
Not using tf.data, with data size 100000, input dimension 10 and
training with a full batch, it took an average of 0.2431719040000644
seconds to run 10 epochs.
Using tf.data, with data size 500, and input dimension 10, and
training with batch size 50, it took an average of 0.09512088866661846
seconds to run 10 epochs.
Using tf.data, with data size 500, and input dimension 10, and
training with batch size 500, it took an average of
0.07286913600000844 seconds to run 10 epochs.
Using tf.data, with data size 100000, and input dimension 10, and
training with batch size 50, it took an average of 4.421892363666605
seconds to run 10 epochs.
Using tf.data, with data size 100000, and input dimension 10, and
training with batch size 100000, it took an average of
2.2555197536667038 seconds to run 10 epochs.
Edit: Fixed an important issue that Fred Guth pointed out. It didn't much affect the results, though.

I wanted to test the dataset API which seems to be really convenient for processing data. I did a lot of time testing about this API in CPU, GPU and multi-GPU way for small and large NN with different type of data.
First thing, It seems to me that your code is ok. But I need to point that your NN is just one simple layer.
Now, the dataset API is not suitable for your type of NN but for NN with a lot more complexity. Why ? For several reasons that I explain below (founded in my quest of understanding the dataset API).
Firstly, in one hand the dataset API processes data each batch whereas in the other hand data are preprocessed. Therefore, if it fits your RAM, you can save time by preprocessing the data. Here your data are just to "simple". If you want to test what i am saying, try to find a really really big dataset to process. Nevertheless, the dataset API can be tuned with prefetching data. You can take a look to this tutorial that explain really well why it is good to process data with prefetch.
Secondly, in my quest of dataset API for Multi-GPU training, I discovered that as far as i know the old pre-processing way is faster than dataset API for small Neural Network. You can verify that by creating a simple stackable RNN which take a sequence in input. You can try different size of stack (i have tested 1, 2, 10 and 20). You will see that, using the dataset API, on 1-GPU or on 4-GPUs, the time did not differ for small RNN stacks (1, 2 and 5).
To summarize, the dataset API is suitable for Neural Network that have data that can't be pre-process. Depending on your task, it may be more convenient to pre-process data, for example if you want to tweak your NN in order to improve it. I agree that the dataset API is really cool for batch, padding and also convenient for shuffling large amount of data but it's also not suitable for multi-GPU training.

First:
You are recreating the dataset unnecessarily.
data_set = tf.data.Dataset.from_tensor_slices((training_inputs, training_labels))
Create the dataset prior to the loop and change the regress_with_tfData input signature to use dataset instead of training_inputs and training_labels.
Second:
The problem here is that minibatches of size 50 or even 500 are too small to compensate the cost of td.data building latency. You should increase the minibatch size. Interestingly you did so with a minibatch of size 100000, but then maybe it is too big ( I am not certain of this, I think it would need more tests).
There are a couple of things you could try:
1) Increase the minibatch size to something like 10000 and see if you get an improvement
2) Change your pipeline to use an iterator, example:
data_set = tf.data.Dataset.from_tensor_slices((training_inputs, training_labels))
data_set = data_set.repeat(n_epochs)
data_set = data_set.batch(batch_size)
iterator = data_set.make_one_shot_iterator()
....
next_element = iterator.get_next()

That is because you are comparing apples with bananas.
On one hand, when using placeholders, you are providing a monolithic tensor as is. On the other hand, when using Dataset, you are slicing the tensor into individual samples. This is very different.
The equivalent of providing a monolothic placeholder tensor with the Dataset pipeline is by using tf.data.Dataset.from_tensors. When I use from_tensors in your example, I get similar (actually smaller) computation times than with placeholders.
If you want to compare a more sophisticated pipeline using from_tensor_slices, you should use a fair comparison with placeholders. For example, shuffle your data. Add some preprocessing on your slices. I have no doubt you will observe the performance gain that makes people switch to this pipeline.

One possible thing you are missing is a prefetch. Add a prefetch of 1 at the end of your data pipeline like so:
data_set = tf.data.Dataset.from_tensor_slices((training_inputs, training_labels))
data_set = data_set.repeat(n_epochs)
data_set = data_set.batch(batch_size).prefetch(1)
Adding a prefetch of 1 at the end of your dataset pipeline means you try and fetch 1 batch of data while training is happening. This way you wont be waiting around while the batch is prepared, it should be ready to go as soon as each train iteration is done.

The accepted answer doesn't help longer valid, as the TF behavior has changed. Per documentation:
from_tensors produces a dataset containing only a single element. To
slice the input tensor into multiple elements, use from_tensor_slices
instead.
This means you cannot batch it
X = np.arange(10)
data = tf.data.Dataset.from_tensors( X )
data = data.batch(2)
for t in data.as_numpy_iterator():
print(t)
# only one row, whereas expected 5 !!!
The documentation recommends from_tensor_slices. But this has quite some overhead when compared to numpy slicing. Slow slicing is an open issue https://github.com/tensorflow/tensorflow/issues/39750
Essentially, slicing in TF is slow and impacts input-bound or light models such as small networks (regression, word2vec).

Preloading data in Tensorflow with shared layers

I have Tensorflow code for multi-task learning (one input, several outputs, similar to this: https://jg8610.github.io/Multi-Task/). For further explanation see below. The code works, but is slow as there's a lot of overhead from reading data in Python and feeding it to the GPU (with the tf.Session's feed_dict).
So my plan is now to preload the data according to https://www.tensorflow.org/programmers_guide/reading_data#preloaded_data [storing it in a tf.constant and using TF's queuing system]. This raises some problems, of which the most central for now seems to be:
If I preload the different task data into different tensors, I no longer have a task-generic X_in. That means that when declaring the shared layer, I now need to make a decision whether to connect it to X_input_task_A or X_input_task_B, and obviously that's not going to result in a shared layer.
My question
Would you have any idea how to solve this problem, i.e. to define shared layers with task-specific tensors, and then training by alternating between tasks? How would you alternatively call the different optimizer operations?
Further explanation on the Multi-task learning paradigm
For background, what the mentioned blog post (as well as my code so far) does is to define a placeholder X_in plus a shared layer that consumes that input op. Then, for each task we want to learn, we have different projections and loss functions that use task-specific placeholders y_task, and training happens by alternately running session.run(optimizer_task, feed_dict={X_in: X_batch_task, y_task: y_batch_task}), where optimizer_task is some task-specific optimizer. This is basically what my code does now - it works but is slow because I need to feed the data:
# PLACEHOLDERS
X_in = tf.placeholder([batch_size, 100])
y_task_a = tf.placeholder([batch_size, 4]) # 4 output classes
y_task_b = tf.placeholder([batch_size, 2]) # 2 output classes
# SHARED LAYER
W = tf.get_variable("W", [100, 50])
shared_layer = tf.sigmoid(tf.matmul(X_in, W))
# TASK-SPECIFIC OUTPUTS
W_task_a = tf.get_variable("Wa", [50, 4])
W_task_b = tf.get_variable("Wb", [50, 2])
pred_task_a = tf.sigmoid(tf.matmul(shared_layer, W_task_a))
pred_task_b = tf.sigmoid(tf.matmul(shared_layer, W_task_b))
# TASK-SPECIFIC LOSSES AND OPTIMIZERS
loss_task_a = tf.nn.softmax_cross_entropy_with_logits(logits=pred_task_a, labels=y_task_a)
loss_task_b = tf.nn.softmax_cross_entropy_with_logits(logits=pred_task_b, labels=y_task_b)
optimizer_a = ...(loss_task_a)
optimizer_b = ...(loss_task_b)
# TRAINING
with tf.Session() as sess:
for i in range(ITERS):
# ALTERNATE BETWEEN TASKS, GET BATCH FROM DATA PER TASK AND TRAIN
X_a, y_a = data_task_a.get_batch()
X_b, y_b = data_task_b.get_batch()
sess.run(optimizer_a, feed_dict={X_in: X_a, y_task_a: y_a})
sess.run(optimizer_b, feed_dict={X_in: X_b, y_task_b: y_b})

How to use Tensorflow's batch_sequences_with_states utility

I am trying to build a generative RNN using Tensorflow. I have a preprocessed dataset which is a list of sequence_length x 2048 x 2 numpy arrays. The sequences have different lengths. I have been looking through examples and documentation but I really couldn't understand, for example, what key is, or how I should create the input_sequences dictionary, etc.
So how should one format a list of numpy arrays, each of which represent a sequence of rank n (2 in this case) tensors, in order to be able to use this batch_sequences_with_states method?

Toy Implementations
I tried this and I will be glad to share my findings with you. It is a toy example. I attempted to create an example that works and observe how the output varies. In particular I used a case study of lstm. For you, you can define a conv net. Feel free to add more input and adjust as usual and follow the doc.
https://www.tensorflow.org/versions/r0.11/api_docs/python/contrib.training/splitting_sequence_inputs_into_minibatches_with_state_saving#batch_sequences_with_states
There are other more subtle examples I tried but I keep this simple version to show how the operation can be useful. In particular add more elements to the dictionaries (input sequence and context sequence) and observe the changes.
Two Approaches
Basically I will use two approaches:
tf.contrib.training.batch_sequences_with_states
tf.train.batch( )
I will start with the first one because it will directly helpful then I will show how to solve similar problem with train.batch.
I will basically be generate toy numpy arrays and tensors and use it for testing the operations
import tensorflow as tf
batch_size = 32
num_unroll = 20
num_enqueue_threads = 20
lstm_size = 8
cell = tf.contrib.rnn.BasicLSTMCell(num_units=lstm_size)
#state size
state_size = cell.state_size[0];
initial_state_values = tf.zeros((state_size,), dtype=tf.float32)
# Initial states
initial_state_values = tf.zeros((state_size,), dtype=tf.float32)
initial_states = {"lstm_state": initial_state_values}
# Key should be string
#I used x as input sequence and y as input context. So that the
# keys should be 2.
key = ["1","2"]
#Toy data for our sample
x = tf.range(0, 12, name="x")
y = tf.range(12,24,name="y")
# convert to float
#I converted to float so as not to raise type mismatch erroe
x=tf.to_float(x)
y=tf.to_float(y)
#the input sequence as dictionary
#This is needed according to the tensorflow doc
sequences = {"x": x }
#Context Input
context = {"batch1": y}
# Train batch with sequence state
batch_new = tf.contrib.training.batch_sequences_with_states(
input_key=key,
input_sequences=sequences,
input_context=context,
initial_states=initial_states,
num_unroll=num_unroll,
batch_size=batch_size,
input_length = None,
pad = True,
num_threads=num_enqueue_threads,
capacity=batch_size * num_enqueue_threads * 2)
# To test what we have got type and observe the output of
# the following
# In short once in ipython notebook
# type batch_new.[press tab] to see all options
batch_new.key
batch_new.sequences
#splitting of input. This generate input per epoch
inputs_by_time = tf.split(inputs, num_unroll)
assert len(inputs_by_time) == num_unroll
# Get lstm or conv net output
lstm_output, _ = tf.contrib.rnn.static_state_saving_rnn(
cell,
inputs_by_time,
state_saver=batch_new,
state_name=("lstm_state","lstm_state"))
Create Graph and Queue as Usual
The parts with # and * can be further adapted to suit requirement.
# Create the graph, etc.
init_op = tf.global_variables_initializer()
#Create a session for running operations in the Graph.
sess = tf.Session()
# Initialize the variables (like the epoch counter).
sess.run(init_op)
# Start input enqueue threads.
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
# For the part below uncomment
#*those comments with asterics to do other operations
#*try:
#* while not coord.should_stop():
#*Run training steps or whatever
#*sess.run(train_op) # uncomment to run other ops
#*except tf.errors.OutOfRangeError:
#print('Done training -- epoch limit reached')
#*finally:
# When done, ask the threads to stop.
coord.request_stop()
# Wait for threads to finish.
coord.join(threads)
sess.close()
Second Approach
You can also use train.batch in a very interesting way:
import tensorflow as tf
#[0, 1, 2, 3, 4 ,...]
x = tf.range(0, 11, name="x")
# A queue that outputs 0,1,2,3,..
# slice end is useful for dequeuing
slice_end = 10
# instantiate variable y
y = tf.slice(x, [0], [slice_end], name="y")
# Reshape y
y = tf.reshape(y,[10,1])
y=tf.to_float(y, name='ToFloat')
Important
Note the use of dynamic and enqueue many with padding. Feel free to play with both options. And compare output!
batched_data = tf.train.batch(
tensors=[y],
batch_size=10,
dynamic_pad=True,
#enqueue_many=True,
name="y_batch"
)
batch_size = 128 ;
lstm_cell = tf.contrib.rnn.LSTMCell(batch_size,forget_bias=1,state_is_tuple=True)
val, state = tf.nn.dynamic_rnn(lstm_cell, batched_data, dtype=tf.float32)
Conclusion
The aim is to show that by simple examples we can get insight into the
details of the operations. You can adapt it to convolutional net in your case.
Hope this helps!

Batching for a non-image data set with Tensorflow

I am a beginner in tensorflow.
I have a data set with 43 inputs and one output. I am gonna create a mini-batch of the data to run deep learning.
Here are my inputs:
x = tf.placeholder(tf.float32, shape=[None, 43])
y_ = tf.placeholder(tf.float32, shape=[None])
which I am feeding them from a matlab file, looking:
train_mat = train_mat["binary_train"].value
feed_dict={x:Train[0:100,0:43] , y_:Train[0:100,43]}
I am gonna have random batch instead of calling 0:100 records.
I saw
tf.train.batch
but, I could not realize how does it work.
Could you please guide me how I can do that.
Thanks,
Afshin

The tf.train.batch and other similar methods are based on Queues, which are best fit in parallel loading huge amount of samples asynchronously. The document here describes basic of using queues in TensorFlow. There is also another blog describing how to read data from files.
If you are going to use queues, the placeholder and feed_dict is unnecessary.
For your specific case, the potential solution maybe look like this:
from tensorflow.python.training import queue_runner
# capacity and min_after_dequeue could be set according to your case
q = tf.RandomShuffleQueue(1000, 500, tf.float32)
enq = q.enqueue_many(train_mat)
queue_runner.add_queue_runner(queue_runner.QueueRunner(q, [enq]))
deq = q.dequeue()
input = deq[:, 0:43]
label = deq[:, 43]
x, y_ = tf.train.batch([input, label], 100)
# then you can use x and y_ directly in inference and train process.
Code above is based on some hypothesis, because information provided in question is not sufficient. However, I hope the code could inspire you in some way.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.