What is steps_per_epoch in model.fit_generator actually doing? - python

After reading the Keras documentation on the steps_per_epoch required argument in the model.fit_generator method my understanding of it is:
If a dataset contains 'N' samples and the generator function (passed to Keras) returns 'B = batch_size' number of samples for every call (Here, I think of a call as a single yield from the generator function) and since steps_per_epoch = ceil(N/B) the generator is being called steps_per_epoch times so that the full dataset is passed through the model after one epoch and this same process is repeated for every epoch until the training is completed.
So as to test whether my understanding was correct, I implemented the following
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
index = 0
def get_values(inputs, targets):
i = 0
while True:
yield inputs[i], targets[i]
i += 1
if i >= len(inputs):
i = 0
def get_batch(inputs, targets, batch_size=2):
global index
batch_X = []
batch_Y = []
for inp, targ in get_values(inputs, targets):
if len(batch_X) >= batch_size:
yield np.array(batch_X), np.array(batch_Y)
index += 1
batch_X = []
batch_Y = []
data = list(range(10))
labels = [2*val for val in range(10)]
model = Sequential([
Dense(16, activation='relu', input_shape=(1, )),
model.compile(optimizer='rmsprop', loss='mean_squared_error')
model.fit_generator(get_batch(data, labels, batch_size=2), steps_per_epoch=5, epochs=1, verbose=False)
print(index) # Should Print 5 but it prints 15
The program isn't tough to understand...
But as per my interpretation, it should print 5 but it prints 15. Am I wrong in the interpretation of steps_per_epoch?
If so please give me the correct interpretation of steps_per_epoch
PS. I'm new to Tensorflow and Keras, thanks in advance.

Did not go through your code but your original interpertation is correct. Actually per the documentation located here you can omit steps per epoch and the model.fit will divide the length of your data set (N) by the batch size to determine the steps. I did copy and run your code. Guess what it printed the index as 5. Only thing I can think of that might be different are the imports.


Tensorflow Data API: repeat()

The following code is a snippet from "Hands on machine learning with scimitar-learn, Keras and tensorflow".
I understand everything in the following code, except the chaining the .repeat(repeat) function in second line.
I know that repeat is repeating the dataset elements (i.e., in this case the file paths) and if the argument is set to None or left empty the repetition will continues forever until the function which using it decides when to stop.
As you can see in the code bellow, the author is setting the repeat() argument to None;
1 - basically I want to know why the author decided to do such?
2 - or is it because the code is trying to simulate a situation which the dataset is not fitting in the memory, if this is the case then in real situation we should avoid repeat(), am I correct?
def csv_reader_dataset(filepaths, repeat=1, n_readers=5,
n_read_threads=None, shuffle_buffer_size=10000,
n_parse_threads=5, batch_size=32):
dataset = tf.data.Dataset.list_files(filepaths, seed = 42).repeat(repeat)
dataset = dataset.interleave(
lambda filepath: tf.data.TextLineDataset(filepath).skip(1),
cycle_length = n_readers, num_parallel_calls = n_read_threads)
dataset = dataset.shuffle(shuffle_buffer_size)
dataset = dataset.map(preprocess, num_parallel_calls = n_parse_threads)
dataset = dataset.batch(batch_size)
return dataset.prefetch(1)
train_set = csv_reader_dataset(train_filepaths, repeat = None)
valid_set = csv_reader_dataset(valid_filepaths)
test_set = csv_reader_dataset(test_filepaths)
model = keras.models.Sequential([
keras.layers.InputLayer(input_shape = X_train.shape[-1: ]),
keras.layers.Dense(30, activation = 'relu'),
m_loss = keras.losses.mean_squared_error
m_optimizer = keras.optimizers.SGD(lr = 1e-3)
batch_size = 32
model.compile(loss = m_loss, optimizer = m_optimizer, metrics = ['accuracy'])
model.fit(train_set, steps_per_epoch = len(X_train) // batch_size, epochs = 10, validation_data = valid_set)
For your questions, I think:
tf.data API won't easily lead to out-of-memory easily as it loads data given the file paths or tfrecrods (compressed mode). Hence, repeat() does not thing with memory here; instead, it is used for data-transforming.
I have to use repeat(#) when setting steps_per_epoch to #. Say your batch_num = 32, and steps_per_epoch = 100//32 = 3 -> require 3 * 32 = 96 samples per epoch but your data has 80 samples only. Then, I have to use data.repeat(2) to have totally 160 samples that 80 samples in repeat_1 and the first 16 samples in repeat_2 would be used within 1 epoch. This is to prevent the error Input run out of data.
I had another copy of the same question on book's author git repo.
The issue is clarified; it was due to a bug in Keras 2.0.
Read more on: https://github.com/ageron/handson-ml2/issues/407

Keras Accuracy and Loss not changing over a large period of epochs

I am trying to create a Convolutional Neural Network to classify what language a certain "word" is from. There are two files ("english_words.txt" and "spanish_words.txt") which each contain about 60,000 words each. I have converted each word into a 29-dimensional vector where each element is a number between 0 and 1. I am training the model for 500 epochs with the optimizer "adam". However, when I train the model, the loss tends to hover around 0.7 and the accuracy around 0.5, and no matter how long I train it for, these metrics will not improve. Here is the code:
import keras
import numpy as np
from keras.layers import Dense
from keras.models import Sequential
import re
train_labels = []
train_data = []
with open("english_words.txt") as words:
full_words = words.read()
full_words = full_words.split("\n")
# all of the labels are just 1.
# we now need to encode them into 29 dimensional vectors.
vector = []
i = 0
for word in full_words:
for letter in word:
vector.append((ord(letter) - 96) * (1.0 / 26.0))
i += 1
if (i < 29):
for x in range(0, 29 - i):
vector = []
i = 0
with open("spanish_words.txt") as words:
full_words = words.read()
full_words = full_words.replace(' ', '')
full_words = full_words.replace('\n', ',')
full_words = full_words.split(",")
vector = []
for word in full_words:
for letter in word:
vector.append((ord(letter) - 96) * (1.0 / 26.0))
i += 1
if (i < 29):
for x in range(0, 29 - i):
vector = []
i = 0
def shuffle_in_unison(a, b):
assert len(a) == len(b)
shuffled_a = np.empty(a.shape, dtype=a.dtype)
shuffled_b = np.empty(b.shape, dtype=b.dtype)
permutation = np.random.permutation(len(a))
for old_index, new_index in enumerate(permutation):
shuffled_a[new_index] = a[old_index]
shuffled_b[new_index] = b[old_index]
return shuffled_a, shuffled_b
train_data = np.asarray(train_data, dtype=np.float32)
train_labels = np.asarray(train_labels, dtype=np.float32)
train_data, train_labels = shuffle_in_unison(train_data, train_labels)
print(train_data.shape, train_labels.shape)
model = Sequential()
model.add(Dense(29, input_shape=(29,)))
model.fit(train_data, train_labels, epochs=500, batch_size=128)
For some extra info, I am running python 3.x with tensorflow 1.15 and keras 1.15 on windows x64.
I can see several potential problems with your code.
You added several Dense layers one after another, but you really need to also include a non-linear activation function with the parameter activation= .... In the absence of any non-linear activation functions, all those fully-connected Dense layers will mathematically collapse into one single linear Dense layer incapable of learning a non-linear decision boundary.
In general, if you see your loss and accuracy not making any improvement or even getting worse, then the first thing to try is to reduce your learning rate.
You don't need to necessarily implement your own shuffling function. The Keras fit() function can do it if you use the shuffle=True parameter.
In addition to the points mentioned by stackoverflowuser2010:
I find this a very good read and highly suggest checking the mentioned points: 37 Reasons why your Neural Network is not working
Center your input data: Compute a component-wise mean vector and subtract it from every input.

Keras Stateful LSTM get low accuracy when testing on training set

Generally, I use the stateful LSTM to make predictions. When I train the LSTM, the output accuracy is quite high. However, when I test the LSTM model on the training set, the accuracy is low! That really confused me, I thought they should be the same. Here are my codes and the outputs. Is there anyone knows why such things happen? Thank you!
model = Sequential()
adam = keras.optimizers.Adam(lr=0.0001)
model.add(LSTM(512, batch_input_shape=(12, 1, 120), return_sequences=False, stateful=True))
model.add(Dense(8, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['accuracy'])
print 'Train...'
for epoch in range(30):
mean_tr_acc = []
mean_tr_loss = []
current_data, current_label, origin_label, is_shuffled = train_iter.next()
for i in range(current_data.shape[1]):
if i%1000==0:
print "current iter at {} with {} iteration".format(i, epoch)
data_slice = current_data[:,i,:]
# Data slice dim: [batch size = 12, time_step=1, feature_dim=120]
data_slice = np.expand_dims(data_slice, axis=1)
label_slice = current_label[:,i,:]
one_hot_labels = keras.utils.to_categorical(label_slice, num_classes=8)
last_element = one_hot_labels[:,-1,:]
tr_loss, tr_acc = model.train_on_batch(np.array(data_slice), np.array(last_element))
print 'accuracy training = {}'.format(np.mean(mean_tr_acc))
print 'loss training = {}'.format(np.mean(mean_tr_loss))
print '___________________________________'
# At here, just evaluate the model on the training dataset
mean_te_acc = []
mean_te_loss = []
for i in range(current_data.shape[1]):
if i%1000==0:
print "current val iter at {} with {} iteration".format(i, epoch)
data_slice = current_data[:,i,:]
data_slice = np.expand_dims(data_slice, axis=1)
label_slice = current_label[:,i,:]
one_hot_labels = keras.utils.to_categorical(label_slice, num_classes=8)
last_element = one_hot_labels[:,-1,:]
te_loss, te_acc = model.test_on_batch(np.array(data_slice), np.array(last_element))
Here is the program output:
current iter at 0 with 13 iteration
current iter at 1000 with 13 iteration
accuracy training = 0.991784930229
loss training = 0.0320105217397
Batch shuffled
current val iter at 0 with 13 iteration
current val iter at 1000 with 13 iteration
accuracy testing = 0.927557885647
loss testing = 0.230829760432
Ok, so here is the problem: it seems that in my code (stateful LSTM), the training error does not really imply the real training error. In other words, more iterations are needed before the model can work well on the validation set (before the model is really trained). Generally, this is a silly mistake:P

Always same output for tensorflow autoencoder

At the moment I try to build an Autoencoder for timeseries data in tensorflow. I have nearly 500 days of data where each day have 24 datapoints. Since this is my first try my architecture is very simple. After my input of size 24 the hidden layers are of size: 10; 3; 10 with an output of again 24. I normalized the data (datapoints are in range [-0.5; 0.5]), use the sigmoid activation function and the RMSPropOptimizer.
After training (loss function in picture) the output is the same for every timedata i give into the network. Does someone know what is the reason for that? Is it possible that my Dataset is the issue (code below)?
class TimeDataset:
def __init__(self,data):
self._index_in_epoch = 0
self._epochs_completed = 0
self._data = data
self._num_examples = data.shape[0]
def data(self):
return self._data
def next_batch(self, batch_size, shuffle=True):
start = self._index_in_epoch
# first call
if start == 0 and self._epochs_completed == 0:
idx = np.arange(0, self._num_examples) # get all possible indexes
np.random.shuffle(idx) # shuffle indexe
self._data = self.data[idx] # get list of `num` random samples
if start + batch_size > self._num_examples:
# not enough samples left -> go to the next batch
self._epochs_completed += 1
rest_num_examples = self._num_examples - start
data_rest_part = self.data[start:self._num_examples]
idx0 = np.arange(0, self._num_examples) # get all possible indexes
np.random.shuffle(idx0) # shuffle indexes
self._data = self.data[idx0] # get list of `num` random samples
start = 0
self._index_in_epoch = batch_size - rest_num_examples #avoid the case where the #sample != integar times of batch_size
end = self._index_in_epoch
data_new_part = self._data[start:end]
return np.concatenate((data_rest_part, data_new_part), axis=0)
# get next batch
self._index_in_epoch += batch_size
end = self._index_in_epoch
return self._data[start:end]
*edit: here are some examples of the output (red original, blue reconstructed):
**edit: I just saw an autoencoder example with a more complicant luss function than mine. Someone know if the loss function self.loss = tf.reduce_mean(tf.pow(self.X - self.decoded, 2)) is sufficient?
***edit: some more code to describe my training
This is my Autoencoder Class:
class AutoEncoder():
def __init__(self):
# Training Parameters
self.learning_rate = 0.005
self.alpha = 0.5
# Network Parameters
self.num_input = 24 # one day as input
self.num_hidden_1 = 10 # 2nd layer num features
self.num_hidden_2 = 3 # 2nd layer num features (the latent dim)
self.X = tf.placeholder("float", [None, self.num_input])
self.weights = {
'encoder_h1': tf.Variable(tf.random_normal([self.num_input, self.num_hidden_1])),
'encoder_h2': tf.Variable(tf.random_normal([self.num_hidden_1, self.num_hidden_2])),
'decoder_h1': tf.Variable(tf.random_normal([self.num_hidden_2, self.num_hidden_1])),
'decoder_h2': tf.Variable(tf.random_normal([self.num_hidden_1, self.num_input])),
self.biases = {
'encoder_b1': tf.Variable(tf.random_normal([self.num_hidden_1])),
'encoder_b2': tf.Variable(tf.random_normal([self.num_hidden_2])),
'decoder_b1': tf.Variable(tf.random_normal([self.num_hidden_1])),
'decoder_b2': tf.Variable(tf.random_normal([self.num_input])),
self.encoded = self.encoder(self.X)
self.decoded = self.decoder(self.encoded)
# Define loss and optimizer, minimize the squared error
self.loss = tf.reduce_mean(tf.pow(self.X - self.decoded, 2))
self.optimizer = tf.train.RMSPropOptimizer(self.learning_rate).minimize(self.loss)
def encoder(self, x):
# sigmoid, tanh, relu
en_layer_1 = tf.nn.sigmoid (tf.add(tf.matmul(x, self.weights['encoder_h1']),
en_layer_2 = tf.nn.sigmoid (tf.add(tf.matmul(en_layer_1, self.weights['encoder_h2']),
return en_layer_2
def decoder(self, x):
de_layer_1 = tf.nn.sigmoid (tf.add(tf.matmul(x, self.weights['decoder_h1']),
de_layer_2 = tf.nn.sigmoid (tf.add(tf.matmul(de_layer_1, self.weights['decoder_h2']),
return de_layer_2
and this is how I train my network (input data have shape (number_days, 24)):
model = autoencoder.AutoEncoder()
num_epochs = 3
batch_size = 50
num_batches = 300
display_batch = 50
examples_to_show = 16
loss_values = []
with tf.Session() as sess:
for e in range(1, num_epochs+1):
print('starting epoch {}'.format(e))
for b in range(num_batches):
# get next batch of data
batch_x = dataset.next_batch(batch_size)
# Run optimization op (backprop) and cost op (to get loss value)
l = sess.run([model.loss], feed_dict={model.X: batch_x})
sess.run(model.optimizer, feed_dict={model.X: batch_x})
# Display logs
if b % display_batch == 0:
print('Epoch {}: Batch ({}) Loss: {}'.format(e, b, l))
# testing
test_data = dataset.next_batch(batch_size)
decoded_test_data = sess.run(model.decoded, feed_dict={model.X: test_data})
Just a suggestion, I have had some issues with autoencoders using the sigmoid function.
I switched to tanh or relu and those improved the results.
With the autoencoder it is basically learning to recreate the output from the input, by encoding and decoding. If you mean it's the same as the input, then you are getting what you want. It has learned the data set.
Ultimately you can compare by reviewing the Mean Squared Error between the input and output and see if it is exactly the same. If you mean that the output is exactly the same regardless of the input, that isn't something I've run into. I guess if your input doesn't vary much from day to day, then I could imagine that would have some impact. Are you looking for anomalies?
Also, if you have a time series for training, I wouldn't shuffle the data in this particular case. If the temporal order is significant, you introduce data leakage (basically introducing future data into the training set) depending on what you are trying to achieve.
Ah, I didn't initially see your post with the graph results.. thanks for adding.
The sigmoid output is floored at 0, so it cannot reproduce your data that is below 0.
If you want to use a sigmoid output, then rescale your data between ]0;1[ (0 and 1 excluded).
I know this is a very old post, so this is just an attempt to help whoever wonders here again with the same problem.... If the autoencoder is converging to the same encoding for all the different instances, there may be a problem in the loss function.... Check the size and shape of the return of the loss function, as it may be getting confused and evaluating the wrong tensors (i.e. you may need to transpose something somewhere) Basically, assuming you are using the autoencoder to encode M features of N training instances, your loss function should return N values. the size of your loss tensor should be the amount of instances in your training set. I found that the hard way.....

Tensorflow train.batch issue

I have a dataset with 40 feature values for each item.
When I try to build a neural network using tensorflow( i am new to tensorflow), this line of the code, is raising an error.
for _ in range(n_batches):
batches = tf.train.batch(input_list, batch_size=batch_size, enqueue_many=True, capacity=3)
ValueError: Dimensions 1 and 40 are not compatible
Input list is calculated by reading in the csv file which contains per item 40 values of feature data
with open('0.csv') as csvfile:
spamreader = csv.reader(csvfile, delimiter=',',quotechar='|', quoting=csv.QUOTE_MINIMAL)
for row in spamreader:
As a sample, the csv file looks like this:
You should convert the input list to a single array first. You can supply a list of tensors/arrays to tf.batch but then every tensor will be split in batches of size 40. Currently you are supplying a list of tensors that have batch size 1 and you are asking to create batches of size 40 for each tensor these tensors. As you cannot create 40 examples from 1 example, you get a dimension mismatch. So instead do something like:
import numpy as np
input_list = np.array(input_list)
labels = np.array(labels)
batch = tf.train.batch([input_list, labels], batch_size=batch_size, enqueue_many=True, capacity=3)
inputs = batch[0]
labels = batch[1]
You can then use inputs and labels to define your network and loss function. For instance (just an illustration, code not tested):
hidden = tf.layers.dense(inputs, 2)
loss = tf.softmax_cross_entropy_with_logits(labels=labels, logits=hidden)
optimizer = tf.train.AdamOptimizer()
train_op = optimizer.minimize(loss)
Also note that you need to call tf.batch only once, not in a loop. Everytime an operation is run that requires inputs or labels, inputs and labels will evaluate to a different batch.
with tf.Session() as sess:
with tf.contrib.slim.queues.QueueRunners(sess):
for i in range(n_batches):
_, loss_value = sess.run([train_op, loss])
print("Loss for batch {}: {}".format(i, loss_value))
You can pass the input_list as a list of tensors. tf.train.batch
for _ in range(n_batches):
batches = tf.train.batch([input_list], batch_size=batch_size,
enqueue_many=True, capacity=3)

