Tensorflow feed multiple parameters through input_fn pipeline - python

I am writing a high-level tensorflow application exactly the same way this minst estimator is build except that I am building a simple RNN that predicts sequences. I am new to tensorflow so I am trying to get head around an issue that might be actually simple for people who have worked in tensorflow high level api before.
Here is a snippet of my code to give an idea:
def main(argv=None):
"""Run the training experiment."""
....
# Setup the Estimator
model_estimator = build_estimator(config, params)
# Setup and start training and validation
train_spec = tf.estimator.TrainSpec(
input_fn=lambda: get_train_inputs(128),
max_steps=2000)
...
tf.estimator.train_and_evaluate(model_estimator, train_spec, eval_spec)
def build_estimator(config, params):
return tf.estimator.Estimator(
model_fn=model_fn,
config=config,
params=params,
)
def model_fn(features, mode, params):
#Input data
_inputs = tf.placeholder(tf.int32, shape=[batch_size, times_steps])
_labels = tf.placeholder(tf.float32, shape=[batch_size, num_classes])
# Sequence lengths for dynamic allocation
_seqlens = tf.placeholder(tf.int32, shape=[batch_size])
...
return tf.estimator.EstimatorSpec(
mode=mode,
predictions=predictions,
loss=loss,
train_op=train_op,
eval_metric_ops=eval_metric_ops
)
Here is my input pipleine function:
# Get train inputs function
def get_train_inputs(batch_size):
def train_inputs(batch_size):
# Build dataset iterator
x_batch, y_batch, seqlen_batch = sequence_generator.get_sentence_batch(
batch_size, sequence_generator.train_x, sequence_generator.train_y, sequence_generator.train_seqlens)
features={'_inputs': x_batch, '_labels': y_batch, '_seqlens': seqlen_batch}
return features
return train_inputs(batch_size)
Due to the size of my code, I have only pasted relevant pieces of code here.
The problem here is that during:
train_spec = tf.estimator.TrainSpec(
input_fn=lambda: get_train_inputs(128),
max_steps=2000)
get_train_inputs(128) feeds the features dictionary into _inputs placeholder of the model_fn so the _labels and _seqlens remain blank and throw out error during execution that no values specified for these place holders. The model_fn only accepts two feature parameters : features and labels. How do I feed all the three parameters _inputs, _labels and _seqlens into the model?
Any suggestions will be highly appreciated.
NOTE: The reason for inputing a third parameter _seqlens is because I am using tf.nn.dynamic_rnn in my model_fn which requires sequence lengths where as labels are being used in tf.nn.softmax_cross_entropy_with_logits in my softmax function.

You're not supposed to use placeholders at all with tf.Estimator. You should look into the tf.data API (see here). Your input function should return the get_next op of a one shot iterator. Apologies if you are already doing this, but it is not clear from your code what exactly your input function is returning.
Assuming you set this up to return a dict as in your example, you will then be able to simply use _inputs = features["_inputs"] etc. in your model function.

In addition to #xdurch0 answer, use FeatureColumns
tf.feature_column to describe the features of the dataset that are passed as inputs into the Estimator model_fn for training and evaluation.
Within the model_fn, use the method
tf.feature_column.input_layer()` to return a dense Tensor as an
input layer based on a specified FeatureColumn.
You can see examples of working with FeatureColumns here.

Related

Implement a custom loss function in Tensorflow BoostedTreesEstimator

I'm trying to implement a boosting model using Tensorflow "BoostedTreesRegressor".
For that, I need to implement a custom loss function where during training, the loss will be calculated according to the logic defined in my custom function rather than using the usual mean_squared_error.
I read in articles that this can be implemented using the interface, "BoostedTreesEstimator" by specifying a head. So, I tried to implement my model as follows:
#define custom loss function to calculate smape
def custom_loss_fn(labels, logits):
return (np.abs(logits - labels) / (np.abs(logits) + np.abs(labels))) * 2
#create input functions
def make_input_fn(X, y, n_epochs=None, shuffle=True):
def input_fn():
dataset = tf.data.Dataset.from_tensor_slices((dict(X), y))
if shuffle:
dataset = dataset.shuffle(NUM_EXAMPLES)
dataset = dataset.repeat(n_epochs)
dataset = dataset.batch(NUM_EXAMPLES)
return dataset
return input_fn
train_input_fn = make_input_fn(dftrain, y_train)
eval_input_fn = make_input_fn(dfeval, y_eval, n_epochs=1, shuffle=False)
my_head = tf.estimator.RegressionHead(loss_fn=custom_loss_fn)
#Training a boosted trees model
est = tf.estimator.BoostedTreesEstimator(feature_columns,
head=my_head,
n_batches_per_layer=1,
n_trees=90,
max_depth=2)
est.train(train_input_fn, max_steps=100)
predictions = list(est.predict(eval_input_fn))
This code provided an error as follows:
'Subclasses of Head must implement create_estimator_spec() or 'NotImplementedError: Subclasses of Head must implement create_estimator_spec() or _create_tpu_estimator_spec().
As I read in articles, create_estimator_spec() is used when we define a model_fn() when creating a new Estimator. Here, I do not want to create any new models or Estimators, I only want to use a custom loss function (instead of default mean squared error) when training where the training model should be equal to BoostedTreesRegressor/BoostingTreesEstimator.
It is a great help if anybody can give me some hint to implement this model.
Make sure you aren't using numpy functions in your loss function--you cannot convert tensors to numpy arrays. Try replacing np.abs with tf.abs. You might be getting the NotImplementedError because your loss function is breaking.

Custom Neural Network Implementation on MNIST using Tensorflow 2.0?

I tried to write a custom implementation of basic neural network with two hidden layers on MNIST dataset using *TensorFlow 2.0 beta* but I'm not sure what went wrong here but my training loss and accuracy seems to stuck at 1.5 and around 85 respectively. But If I build the using Keras I was getting very low training loss and accuracy above 95% with just 8-10 epochs.
I believe that maybe I'm not updating my weights or something? So do I need to assign my new weights which I compute in backprop function backs to their respective weights/bias variables?
I really appreciate if someone could help me out with this and these few more questions that I've mentioned below.
Few more Questions:
1) How to add a Dropout and Batch Normalization layer in this custom implementation? (i.e making it work for both train and test time)
2) How can I use callbacks in this code? i.e (making use of EarlyStopping and ModelCheckpoint callbacks)
3) Is there anything else in my code below that I can optimize further in this code like maybe making use of tensorflow 2.x #tf.function decorator etc.)
4) I would also require to extract the final weights that I obtain for plotting and checking their distributions. To investigate issues like gradient vanishing or exploding. (Eg: Maybe Tensorboard)
5) I also want help in writing this code in a more generalized way so I can easily implement other networks like ConvNets (i.e Conv, MaxPool, etc.) based on this code easily.
Here's my full code for easy reproducibility :
Note: I know I can use high-level API like Keras to build the model much easier but that is not my goal here. Please understand.
import numpy as np
import os
import logging
logging.getLogger('tensorflow').setLevel(logging.ERROR)
import tensorflow as tf
import tensorflow_datasets as tfds
(x_train, y_train), (x_test, y_test) = tfds.load('mnist', split=['train', 'test'],
batch_size=-1, as_supervised=True)
# reshaping
x_train = tf.reshape(x_train, shape=(x_train.shape[0], 784))
x_test = tf.reshape(x_test, shape=(x_test.shape[0], 784))
ds_train = tf.data.Dataset.from_tensor_slices((x_train, y_train))
# rescaling
ds_train = ds_train.map(lambda x, y: (tf.cast(x, tf.float32)/255.0, y))
class Model(object):
def __init__(self, hidden1_size, hidden2_size, device=None):
# layer sizes along with input and output
self.input_size, self.output_size, self.device = 784, 10, device
self.hidden1_size, self.hidden2_size = hidden1_size, hidden2_size
self.lr_rate = 1e-03
# weights initializationg
self.glorot_init = tf.initializers.glorot_uniform(seed=42)
# weights b/w input to hidden1 --> 1
self.w_h1 = tf.Variable(self.glorot_init((self.input_size, self.hidden1_size)))
# weights b/w hidden1 to hidden2 ---> 2
self.w_h2 = tf.Variable(self.glorot_init((self.hidden1_size, self.hidden2_size)))
# weights b/w hidden2 to output ---> 3
self.w_out = tf.Variable(self.glorot_init((self.hidden2_size, self.output_size)))
# bias initialization
self.b1 = tf.Variable(self.glorot_init((self.hidden1_size,)))
self.b2 = tf.Variable(self.glorot_init((self.hidden2_size,)))
self.b_out = tf.Variable(self.glorot_init((self.output_size,)))
self.variables = [self.w_h1, self.b1, self.w_h2, self.b2, self.w_out, self.b_out]
def feed_forward(self, x):
if self.device is not None:
with tf.device('gpu:0' if self.device=='gpu' else 'cpu'):
# layer1
self.layer1 = tf.nn.sigmoid(tf.add(tf.matmul(x, self.w_h1), self.b1))
# layer2
self.layer2 = tf.nn.sigmoid(tf.add(tf.matmul(self.layer1,
self.w_h2), self.b2))
# output layer
self.output = tf.nn.softmax(tf.add(tf.matmul(self.layer2,
self.w_out), self.b_out))
return self.output
def loss_fn(self, y_pred, y_true):
self.loss = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y_true,
logits=y_pred)
return tf.reduce_mean(self.loss)
def acc_fn(self, y_pred, y_true):
y_pred = tf.cast(tf.argmax(y_pred, axis=1), tf.int32)
y_true = tf.cast(y_true, tf.int32)
predictions = tf.cast(tf.equal(y_true, y_pred), tf.float32)
return tf.reduce_mean(predictions)
def backward_prop(self, batch_xs, batch_ys):
optimizer = tf.keras.optimizers.Adam(learning_rate=self.lr_rate)
with tf.GradientTape() as tape:
predicted = self.feed_forward(batch_xs)
step_loss = self.loss_fn(predicted, batch_ys)
grads = tape.gradient(step_loss, self.variables)
optimizer.apply_gradients(zip(grads, self.variables))
n_shape = x_train.shape[0]
epochs = 20
batch_size = 128
ds_train = ds_train.repeat().shuffle(n_shape).batch(batch_size).prefetch(batch_size)
neural_net = Model(512, 256, 'gpu')
for epoch in range(epochs):
no_steps = n_shape//batch_size
avg_loss = 0.
avg_acc = 0.
for (batch_xs, batch_ys) in ds_train.take(no_steps):
preds = neural_net.feed_forward(batch_xs)
avg_loss += float(neural_net.loss_fn(preds, batch_ys)/no_steps)
avg_acc += float(neural_net.acc_fn(preds, batch_ys) /no_steps)
neural_net.backward_prop(batch_xs, batch_ys)
print(f'Epoch: {epoch}, Training Loss: {avg_loss}, Training ACC: {avg_acc}')
# output for 10 epochs:
Epoch: 0, Training Loss: 1.7005115111824125, Training ACC: 0.7603832868262543
Epoch: 1, Training Loss: 1.6052448933478445, Training ACC: 0.8524806404020637
Epoch: 2, Training Loss: 1.5905528008006513, Training ACC: 0.8664196092868224
Epoch: 3, Training Loss: 1.584107405738905, Training ACC: 0.8727630912326276
Epoch: 4, Training Loss: 1.5792385798413306, Training ACC: 0.8773203844903037
Epoch: 5, Training Loss: 1.5759121985174716, Training ACC: 0.8804754322627559
Epoch: 6, Training Loss: 1.5739163148682564, Training ACC: 0.8826455712551251
Epoch: 7, Training Loss: 1.5722616605926305, Training ACC: 0.8840812018606812
Epoch: 8, Training Loss: 1.569699136307463, Training ACC: 0.8867688354803249
Epoch: 9, Training Loss: 1.5679460542742163, Training ACC: 0.8885049475356936
I wondered where to start with your multiquestion, and I decided to do so with a statement:
Your code definitely should not look like that and is nowhere near current Tensorflow best practices.
Sorry, but debugging it step by step is waste of everyone's time and would not benefit either of us.
Now, moving to the third point:
Is there anything else in my code below that I can optimize further
in this code like maybe making use of tensorflow 2.x #tf.function
decorator etc.)
Yes, you can use tensorflow2.0 functionalities and it seems like you are running away from those (tf.function decorator is of no use here actually, leave it for the time being).
Following new guidelines would alleviate your problems with your 5th point as well, namely:
I also want help in writing this code in a more generalized way so
I can easily implement other networks like ConvNets (i.e Conv, MaxPool
etc.) based on this code easily.
as it's designed specifically for that. After a little introduction I will try to introduce you to those concepts in a few steps:
1. Divide your program into logical parts
Tensorflow did much harm when it comes to code readability; everything in tf1.x was usually crunched in one place, globals followed by function definition followed by another globals or maybe data loading, all in all mess. It's not really developers fault as the system's design encouraged those actions.
Now, in tf2.0 programmer is encouraged to divide his work similarly to the structure one can see in pytorch, chainer and other more user-friendly frameworks.
1.1 Data loading
You were on good path with Tensorflow Datasets but you turned away for no apparent reason.
Here is your code with commentary what's going on:
# You already have tf.data.Dataset objects after load
(x_train, y_train), (x_test, y_test) = tfds.load('mnist', split=['train', 'test'],
batch_size=-1, as_supervised=True)
# But you are reshaping them in a strange manner...
x_train = tf.reshape(x_train, shape=(x_train.shape[0], 784))
x_test = tf.reshape(x_test, shape=(x_test.shape[0], 784))
# And building from slices...
ds_train = tf.data.Dataset.from_tensor_slices((x_train, y_train))
# Unreadable rescaling (there are built-ins for that)
You can easily generalize this idea for any dataset, place this in separate module, say datasets.py:
import tensorflow as tf
import tensorflow_datasets as tfds
class ImageDatasetCreator:
#classmethod
# More portable and readable than dividing by 255
def _convert_image_dtype(cls, dataset):
return dataset.map(
lambda image, label: (
tf.image.convert_image_dtype(image, tf.float32),
label,
)
)
def __init__(self, name: str, batch: int, cache: bool = True, split=None):
# Load dataset, every dataset has default train, test split
dataset = tfds.load(name, as_supervised=True, split=split)
# Convert to float range
try:
self.train = ImageDatasetCreator._convert_image_dtype(dataset["train"])
self.test = ImageDatasetCreator._convert_image_dtype(dataset["test"])
except KeyError as exception:
raise ValueError(
f"Dataset {name} does not have train and test, write your own custom dataset handler."
) from exception
if cache:
self.train = self.train.cache() # speed things up considerably
self.test = self.test.cache()
self.batch: int = batch
def get_train(self):
return self.train.shuffle().batch(self.batch).repeat()
def get_test(self):
return self.test.batch(self.batch).repeat()
So now you can load more than mnist using simple command:
from datasets import ImageDatasetCreator
if __name__ == "__main__":
dataloader = ImageDatasetCreator("mnist", batch=64, cache = True)
train, test = dataloader.get_train(), dataloader.get_test()
And you could use any name other than mnist you want to load datasets from now on.
Please, stop making everything deep learning related one hand-off scripts, you are a programmer as well.
1.2 Model creation
Since tf2.0 there are two advised ways one can proceed depending on models complexity:
tensorflow.keras.models.Sequential - this way was shown by #Stewart_R, no need to reiterate his points. Used for the simplest models (you should use this one with your feedforward).
Inheriting tensorflow.keras.Model and writing custom model. This one should be used when you have some kind of logic inside your module or it's more complicated (things like ResNets, multipath networks etc.). All in all more readable and customizable.
Your Model class tried to resemble something like that but it went south again; backprop definitely is not part of the model itself, neither is loss or accuracy, separate them into another module or function, defo not a member!
That said, let's code the network using the second approach (you should place this code in model.py for brevity). Before that, I will code YourDense feedforward layer from scratch by inheriting from tf.keras.Layers (this one might go into layers.py module):
import tensorflow as tf
class YourDense(tf.keras.layers.Layer):
def __init__(self, units):
# It's Python 3, you don't have to specify super parents explicitly
super().__init__()
self.units = units
# Use build to create variables, as shape can be inferred from previous layers
# If you were to create layers in __init__, one would have to provide input_shape
# (same as it occurs in PyTorch for example)
def build(self, input_shape):
# You could use different initializers here as well
self.kernel = self.add_weight(
shape=(input_shape[-1], self.units),
initializer="random_normal",
trainable=True,
)
# You could define bias in __init__ as well as it's not input dependent
self.bias = self.add_weight(shape=(self.units,), initializer="random_normal")
# Oh, trainable=True is default
def call(self, inputs):
# Use overloaded operators instead of tf.add, better readability
return tf.matmul(inputs, self.kernel) + self.bias
Regarding your
How to add a Dropout and Batch Normalization layer in this custom
implementation? (i.e making it work for both train and test time)
I suppose you would like to create a custom implementation of those layers.
If not, you can just import from tensorflow.keras.layers import Dropout and use it anywhere you want as #Leevo pointed out.
Inverted dropout with different behaviour during train and test below:
class CustomDropout(layers.Layer):
def __init__(self, rate, **kwargs):
super().__init__(**kwargs)
self.rate = rate
def call(self, inputs, training=None):
if training:
# You could simply create binary mask and multiply here
return tf.nn.dropout(inputs, rate=self.rate)
# You would need to multiply by dropout rate if you were to do that
return inputs
Layers taken from here and modified to better fit showcasing purpose.
Now you can create your model finally (simple double feedforward):
import tensorflow as tf
from layers import YourDense
class Model(tf.keras.Model):
def __init__(self):
super().__init__()
# Use Sequential here for readability
self.network = tf.keras.Sequential(
[YourDense(100), tf.keras.layers.ReLU(), YourDense(10)]
)
def call(self, inputs):
# You can use non-parametric layers inside call as well
flattened = tf.keras.layers.Flatten()(inputs)
return self.network(flattened)
Ofc, you should use built-ins as much as possible in general implementations.
This structure is pretty extensible, so generalization to convolutional nets, resnets, senets, whatever should be done via this module. You can read more about it here.
I think it fulfills your 5th point:
I also want help in writing this code in a more generalized way so
I can easily implement other networks like ConvNets (i.e Conv, MaxPool
etc.) based on this code easily.
Last thing, you may have to use model.build(shape) in order to build your model's graph.
model.build((None, 28, 28, 1))
This would be for MNIST's 28x28x1 input shape, where None stands for batch.
1.3 Training
Once again, training could be done in two separate ways:
standard Keras model.fit(dataset) - useful in simple tasks like classification
tf.GradientTape - more complicated training schemes, most prominent example would be Generative Adversarial Networks, where two models optimize orthogonal goals playing minmax game
As pointed out by #Leevo once again, if you are to use the second way, you won't be able to simply use callbacks provided by Keras, hence I'd advise to stick with the first option whenever possible.
In theory you could call callback's functions manually like on_batch_begin() and others where needed, but it would be cumbersome and I'm not sure how would this work.
When it comes to the first option, you can use tf.data.Dataset objects directly with fit. Here is it presented inside another module (preferably train.py):
def train(
model: tf.keras.Model,
path: str,
train: tf.data.Dataset,
epochs: int,
steps_per_epoch: int,
validation: tf.data.Dataset,
steps_per_validation: int,
stopping_epochs: int,
optimizer=tf.optimizers.Adam(),
):
model.compile(
optimizer=optimizer,
# I used logits as output from the last layer, hence this
loss=tf.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=[tf.metrics.SparseCategoricalAccuracy()],
)
model.fit(
train,
epochs=epochs,
steps_per_epoch=steps_per_epoch,
validation_data=validation,
validation_steps=steps_per_validation,
callbacks=[
# Tensorboard logging
tf.keras.callbacks.TensorBoard(
pathlib.Path("logs")
/ pathlib.Path(datetime.datetime.now().strftime("%Y%m%d-%H%M%S")),
histogram_freq=1,
),
# Early stopping with best weights preserving
tf.keras.callbacks.EarlyStopping(
monitor="val_sparse_categorical_accuracy",
patience=stopping_epochs,
restore_best_weights=True,
),
],
)
model.save(path)
More complicated approach is very similar (almost copy and paste) to PyTorch training loops, so if you are familiar with those, they should not pose much of a problem.
You can find examples throughout tf2.0 docs, e.g. here or here.
2. Other things
2.1 Unanswered questions
Is there anything else in the code that I can optimize further in
this code? i.e (making use of tensorflow 2.x #tf.function decorator
etc.)
Above already transforms the Model into graphs, hence I don't think you would benefit from calling it in this case. And premature optimization is the root of all evil, remember to measure your code before doing this.
You would gain much more with proper caching of data (as described at the beginning of #1.1) and good pipeline rather than those.
Also I need a way to extract all my final weights for all layers
after training so I can plot them and check their distributions. To
check issues like gradient vanishing or exploding.
As pointed out by #Leevo above,
weights = model.get_weights()
Would get you the weights. You may transform them into np.array and plot using seaborn, matplotlib, analyze, check or whatever else you want.
2.2 Putting it altogether
All in all, your main.py (or entrypoint or something similar) would consist of this (more or less):
from dataset import ImageDatasetCreator
from model import Model
from train import train
# You could use argparse for things like batch, epochs etc.
if __name__ == "__main__":
dataloader = ImageDatasetCreator("mnist", batch=64, cache=True)
train, test = dataloader.get_train(), dataloader.get_test()
model = Model()
model.build((None, 28, 28, 1))
train(
model, train, path epochs, test, len(train) // batch, len(test) // batch, ...
) # provide necessary arguments appropriately
# Do whatever you want with those
weights = model.get_weights()
Oh, remember that above functions are not for copy pasting and should be treated more like a guideline. Hit me up if you have any questions.
3. Questions from comments
3.1 How to initialize custom and built-in layers
3.1.1 TLDR what you are about to read
Custom Poisson initalization function, but it takes three
arguments
tf.keras.initalization API needs two arguments (see last point in their docs), hence one is
specified via Python's lambda inside custom layer we have written before
Optional bias for the layer is added, which can be turned off with
boolean
Why is it so uselessly complicated? To show that in tf2.0 you can finally use Python's functionality, no more graph hassle, if instead of tf.cond etc.
3.1.2 From TLDR to implementation
Keras initializers can be found here and Tensorflow's flavor here.
Please note API inconsistencies (capital letters like classes, small letters with underscore like functions), especially in tf2.0, but that's beside the point.
You can use them by passing a string (as it's done in YourDense above) or during object creation.
To allow for custom initialization in your custom layers, you can simply add additional argument to the constructor (tf.keras.Model class is still Python class and it's __init__ should be used same as Python's).
Before that, I will show you how to create custom initialization:
# Poisson custom initialization because why not.
def my_dumb_init(shape, lam, dtype=None):
return tf.squeeze(tf.random.poisson(shape, lam, dtype=dtype))
Notice, it's signature takes three arguments, while it should take (shape, dtype) only. Still, one can "fix" this easily while creating his own layer, like the one below (extended YourLinear):
import typing
import tensorflow as tf
class YourDense(tf.keras.layers.Layer):
# It's still Python, use it as Python, that's the point of tf.2.0
#classmethod
def register_initialization(cls, initializer):
# Set defaults if init not provided by user
if initializer is None:
# let's make the signature proper for init in tf.keras
return lambda shape, dtype: my_dumb_init(shape, 1, dtype)
return initializer
def __init__(
self,
units: int,
bias: bool = True,
# can be string or callable, some typing info added as well...
kernel_initializer: typing.Union[str, typing.Callable] = None,
bias_initializer: typing.Union[str, typing.Callable] = None,
):
super().__init__()
self.units: int = units
self.kernel_initializer = YourDense.register_initialization(kernel_initializer)
if bias:
self.bias_initializer = YourDense.register_initialization(bias_initializer)
else:
self.bias_initializer = None
def build(self, input_shape):
# Simply pass your init here
self.kernel = self.add_weight(
shape=(input_shape[-1], self.units),
initializer=self.kernel_initializer,
trainable=True,
)
if self.bias_initializer is not None:
self.bias = self.add_weight(
shape=(self.units,), initializer=self.bias_initializer
)
else:
self.bias = None
def call(self, inputs):
weights = tf.matmul(inputs, self.kernel)
if self.bias is not None:
return weights + self.bias
I have added my_dumb_initialization as the default (if user does not provide one) and made the bias optional with bias argument. Note you can use if freely as long as it's not data dependent. If it is (or is dependent on tf.Tensor somehow), one has to use #tf.function decorator which changes Python's flow to it's tensorflow counterpart (e.g. if to tf.cond).
See here for more on autograph, it's very easy to follow.
If you want to incorporate above initializer changes into your model, you have to create appropriate object and that's it.
... # Previous of code Model here
self.network = tf.keras.Sequential(
[
YourDense(100, bias=False, kernel_initializer="lecun_uniform"),
tf.keras.layers.ReLU(),
YourDense(10, bias_initializer=tf.initializers.Ones()),
]
)
... # and the same afterwards
With built-in tf.keras.layers.Dense layers, one can do the same (arguments names differ, but idea holds).
3.2 Automatic Differentiation using tf.GradientTape
3.2.1 Intro
Point of tf.GradientTape is to allow users normal Python control flow and gradient calculation of variables with respect to another variable.
Example taken from here but broken into separate pieces:
def f(x, y):
output = 1.0
for i in range(y):
if i > 1 and i < 5:
output = tf.multiply(output, x)
return output
Regular python function with for and if flow control statements
def grad(x, y):
with tf.GradientTape() as t:
t.watch(x)
out = f(x, y)
return t.gradient(out, x)
Using gradient tape you can record all operations on Tensors (and their intermediate states as well) and "play" it backwards (perform automatic backward differentiation using chaing rule).
Every Tensor within tf.GradientTape() context manager is recorded automatically. If some Tensor is out of scope, use watch() method as one can see above.
Finally, gradient of output with respect to x (input is returned).
3.2.2 Connection with deep learning
What was described above is backpropagation algorithm. Gradients w.r.t (with respect to) outputs are calculated for each node in the network (or rather for every layer). Those gradients are then used by various optimizers to make corrections and so it repeats.
Let's continue and assume you have your tf.keras.Model, optimizer instance, tf.data.Dataset and loss function already set up.
One can define a Trainer class which will perform training for us. Please read comments in the code if in doubt:
class Trainer:
def __init__(self, model, optimizer, loss_function):
self.model = model
self.loss_function = loss_function
self.optimizer = optimizer
# You could pass custom metrics in constructor
# and adjust train_step and test_step accordingly
self.train_loss = tf.keras.metrics.Mean(name="train_loss")
self.test_loss = tf.keras.metrics.Mean(name="train_loss")
def train_step(self, x, y):
# Setup tape
with tf.GradientTape() as tape:
# Get current predictions of network
y_pred = self.model(x)
# Calculate loss generated by predictions
loss = self.loss_function(y, y_pred)
# Get gradients of loss w.r.t. EVERY trainable variable (iterable returned)
gradients = tape.gradient(loss, self.model.trainable_variables)
# Change trainable variable values according to gradient by applying optimizer policy
self.optimizer.apply_gradients(zip(gradients, self.model.trainable_variables))
# Record loss of current step
self.train_loss(loss)
def train(self, dataset):
# For N epochs iterate over dataset and perform train steps each time
for x, y in dataset:
self.train_step(x, y)
def test_step(self, x, y):
# Record test loss separately
self.test_loss(self.loss_function(y, self.model(x)))
def test(self, dataset):
# Iterate over whole dataset
for x, y in dataset:
self.test_step(x, y)
def __str__(self):
# You need Python 3.7 with f-string support
# Just return metrics
return f"Loss: {self.train_loss.result()}, Test Loss: {self.test_loss.result()}"
Now, you could use this class in your code really simply like this:
EPOCHS = 5
# model, optimizer, loss defined beforehand
trainer = Trainer(model, optimizer, loss)
for _ in range(EPOCHS):
trainer.train(train_dataset) # Same for training and test datasets
trainer.test(test_dataset)
print(f"Epoch {epoch}: {trainer})")
Print would tell you training and test loss for each epoch. You can mix training and testing any way you want (e.g. 5 epochs for training and 1 testing), you could add different metrics etc.
See here if you want non-OOP oriented approach (IMO less readable, but to each it's own).
Also If there's something I could improve in the code do let me know
as well.
Embrace the high-level API for something like this. You can do it in just a few lines of code and it's much easier to debug, read and reason about:
(x_train, y_train), (x_test, y_test) = tfds.load('mnist', split=['train', 'test'],
batch_size=-1, as_supervised=True)
x_train = tf.cast(tf.reshape(x_train, shape=(x_train.shape[0], 784)), tf.float32)
x_test = tf.cast(tf.reshape(x_test, shape=(x_test.shape[0], 784)), tf.float32)
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(512, activation='sigmoid'),
tf.keras.layers.Dense(256, activation='sigmoid'),
tf.keras.layers.Dense(10, activation='softmax')
])
model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test)
I tried to write a custom implementation of basic neural network with
two hidden layers on MNIST dataset using tensorflow 2.0 beta but I'm
not sure what went wrong here but my training loss and accuracy seems
to stuck at 1.5 and around85's respectively.
Where is the training part? Training of TF 2.0 models either Keras' syntax or Eager execution with tf.GradientTape(). Can you paste the code with conv and dense layers, and how you trained it?
Other questions:
1) How to add a Dropout layer in this custom implementation? i.e
(making it work for both train and test time)
You can add a Dropout() layer with:
from tensorflow.keras.layers import Dropout
And then you insert it into a Sequential() model just with:
Dropout(dprob) # where dprob = dropout probability
2) How to add Batch Normalization in this code?
Same as before, with:
from tensorflow.keras.layers import BatchNormalization
The choise of where to put batchnorm in the model, well, that's up to you. There is no rule of thumb, I suggest you to make experiments. With ML it's always a trial and error process.
3) How can I use callbacks in this code? i.e (making use of
EarlyStopping and ModelCheckpoint callbacks)
If you are training using Keras' syntax, you can simply use that. Please check this very thorough tutorial on how to use it. It just takes few lines of code.
If you are running a model in Eager execution, you have to implement these techniques yourself, with your own code. It's more complex, but it also gives you more freedom in the implementation.
4) Is there anything else in the code that I can optimize further in
this code? i.e (making use of tensorflow 2.x #tf.function decorator
etc.)
It depends. If you are using Keras syntax, I don't think you need to add more to it. In case you are training the model in Eager execution, then I'd suggest you to use the #tf.function decorator on some function to speed up a bit.
You can see a practical TF 2.0 example on how to use the decorator in this Notebook.
Other than this, I suggest you to play with regularization techniques such as weights initializations, L1-L2 loss, etc.
5) Also I need a way to extract all my final weights for all layers
after training so I can plot them and check their distributions. To
check issues like gradient vanishing or exploding.
Once the model is trained, you can extract its weights with:
weights = model.get_weights()
or:
weights = model.trainable_weights
If you want to keep only trainable ones.
6) I also want help in writing this code in a more generalized way so
I can easily implement other networks like convolutional network (i.e
Conv, MaxPool etc.) based on this code easily.
You can pack all your code into a function, then . At the end of this Notebook I did something like this (it's for a feed-forward NN, which is much more simple, but that's a start and you can change the code according to your needs).
---
UPDATE:
Please check my TensorFlow 2.0 implementaion of a CNN classifier. This might be a useful hint: it is trained on the Fashion MNIST dataset, which makes it very similar to your task.

Tensorflow: how to use RNN initial state in an estimator with different batch size for training and testing?

I am working on a Tensorflow estimator using RNN (GRUCell).
I use zero_state to initialize the first state, it requires a fixed size.
My problem is that I want to be able to use the estimator to predict with a single sample (batchsize=1).
When it load the serialized estimator, it complain that the size of the batch I use for prediction does not match the training batch size.
If I reconstruct the estimator with a different batch size, I cannot load what has been serialized.
Is there an elegant way to use zero_state in an estimator?
I saw some solutions using a variable to store batch size, but using feed_dict method. I don't find how to make it work in the context of an estimator.
Here is the core of my simple test RNN in the estimator:
cells = [ tf.nn.rnn_cell.GRUCell(self.getNSize()) for _ in range(self.getNLayers())]
multicell = tf.nn.rnn_cell.MultiRNNCell(cells, state_is_tuple=False)
H_init = tf.Variable( multicell.zero_state( batchsize, dtype=tf.float32 ), trainable=False)
H = tf.Variable( H_init )
Yr, state = tf.nn.dynamic_rnn(multicell, Xo, dtype=tf.float32, initial_state=H)
Would someone have a clue on that?
EDIT:
Ok, I try various things on this problem.
I now try to filter the variables I load from the checkpoint to remove 'H', which is used as internal state of the recurrent cells. For prediction, I can leave it with all 0 values.
So far, I did that:
First I define a hook:
class RestoreHook(tf.train.SessionRunHook):
def __init__(self, init_fn):
self.init_fn = init_fn
def after_create_session(self, session, coord=None):
print("--------------->After create session.")
self.init_fn(session)
Then in my model_fn:
if mode == tf.estimator.ModeKeys.PREDICT:
logits = tf.nn.softmax(logits)
# Do not restore H as it's batch size might be different.
vlist = tf.contrib.framework.get_variables_to_restore()
vlist = [ x for x in vlist if x.name.split(':')[0] != 'architecture/H']
init_fn = tf.contrib.framework.assign_from_checkpoint_fn(tf.train.latest_checkpoint(self.modelDir), vlist, ignore_missing_vars=True)
spec = tf.estimator.EstimatorSpec(mode=mode,
predictions = {
'logits': logits,
},
export_outputs={
'prediction': tf.estimator.export.PredictOutput( logits )
},
prediction_hooks=[RestoreHook(init_fn)])
I took this piece of code from https://github.com/tensorflow/tensorflow/issues/14713
But it does not work yet. It seems that it still trying to load H from the file... I checked that it is not in vlist.
I am still looking for a solution.
You can get batch size form other tensor example
decoder_initial_state = cell.zero_state(array_ops.shape(attention_states)[0],
dtypes.float32).clone(cell_state=encoder_state)
I found a solution:
I create the variables for the initial state for both batchsize=64 and batchsize=1.
At training I use the first one to initialize the RNN.
At Predict time, I use the second one.
It works as both those variables will be serialized and restored by the estimator code so it will not complain.
The drawback is that the query batch size (in my case, 1) bust be known at training time (when it create both variables).

How to run asynchronous predictions with TensorFlow Estimator API?

I am using the tf.estimator API to predict punctuation. I trained it with pre-processed data using TFRecords and tf.train.shuffle_batch. Now I want to make predictions. I can do this fine feeding static NumPy data into tf.constant and returning this from the input_fn.
However I am working with sequence data and I need to feed one example at a time and the next input is dependent on the previous output. I also want to be able to process data input through HTTP requests.
Every time estimator.predict is called it re-loads the checkpoint and recreates the entire graph. This is slow and expensive. So I need to be able to dynamically feed data to the input_fn.
My current attempt is roughly this:
feature_input = tf.placeholder(tf.int32, shape=[1, MAX_SUBSEQUENCE_LEN])
q = tf.FIFOQueue(1, tf.int32, shapes=[[1, MAX_SUBSEQUENCE_LEN]])
enqueue_op = q.enqueue(feature_input)
def input_fn():
return q.dequeue()
estimator = tf.estimator.Estimator(model_fn, model_dir=model_file)
predictor = estimator.predict(input_fn=input_fn)
sess = tf.Session()
output = None
while True:
x = get_numpy_data(x, output)
if x is None:
break
sess.run(enqueue_op, {feature_input: x})
output = predictor.next()
save_to_file(output)
sess.close()
However I am getting the following error:
ValueError: Input graph and Layer graph are not the same: Tensor("EmbedSequence/embedding_lookup:0", shape=(1, 200, 128), dtype=float32) is not from the passed-in graph.
How can I asynchronously plug data into my existing graph through an input_fn to get predictions one at a time?
It turns out the main problem is that all tensors need to be created inside the input_fn or they don't get added to the same graph. I needed to run an enqueue operation but it was impossible to access anything returned from the input function.
I ended up inheriting the Estimator class and creating a custom predict function which allows me to dynamically add data to the prediction queue and return the results:
# async_estimator.py
import six
import tensorflow as tf
from tensorflow.python.estimator.estimator import Estimator
from tensorflow.python.estimator.estimator import _check_hooks_type
from tensorflow.python.estimator import model_fn as model_fn_lib
from tensorflow.python.framework import ops
from tensorflow.python.framework import random_seed
from tensorflow.python.training import saver
from tensorflow.python.training import training
class AsyncEstimator(Estimator):
def async_predictor(self,
dtype,
shape=None,
predict_keys=None,
hooks=None,
checkpoint_path=None):
"""Returns a tuple of functions: first runs predicitons on the model, second cleans up
Args:
dtype: the dtype of the input
shape: the shape of the input placeholder (optional)
predict_keys: list of `str`, name of the keys to predict. It is used if
the `EstimatorSpec.predictions` is a `dict`. If `predict_keys` is used
then rest of the predictions will be filtered from the dictionary. If
`None`, returns all.
hooks: List of `SessionRunHook` subclass instances. Used for callbacks
inside the prediction call.
checkpoint_path: Path of a specific checkpoint to predict. If `None`, the
latest checkpoint in `model_dir` is used.
Returns:
(predict, finish): tuple of functions
predict: runs a single prediction and returns the results
Args:
x: NumPy array of input
Returns:
Evaluated value of the prediction
finish: closes the session, allowing the program to exit
Raises:
ValueError: Could not find a trained model in model_dir.
ValueError: if batch length of predictions are not same.
ValueError: If there is a conflict between `predict_keys` and
`predictions`. For example if `predict_keys` is not `None` but
`EstimatorSpec.predictions` is not a `dict`.
"""
hooks = _check_hooks_type(hooks)
# Check that model has been trained.
if not checkpoint_path:
checkpoint_path = saver.latest_checkpoint(self._model_dir)
if not checkpoint_path:
raise ValueError('Could not find trained model in model_dir: {}.'.format(
self._model_dir))
with ops.Graph().as_default() as g:
random_seed.set_random_seed(self._config.tf_random_seed)
training.create_global_step(g)
input_placeholder = tf.placeholder(dtype=dtype, shape=shape)
queue = tf.FIFOQueue(1, dtype, shapes=shape)
enqueue_op = queue.enqueue(input_placeholder)
features = queue.dequeue()
estimator_spec = self._call_model_fn(features, None,
model_fn_lib.ModeKeys.PREDICT)
predictions = self._extract_keys(estimator_spec.predictions, predict_keys)
mon_sess = training.MonitoredSession(
session_creator=training.ChiefSessionCreator(
checkpoint_filename_with_path=checkpoint_path,
scaffold=estimator_spec.scaffold,
config=self._session_config),
hooks=hooks)
def predict(x):
if mon_sess.should_stop():
raise StopIteration
mon_sess.run(enqueue_op, {input_placeholder: x})
preds_evaluated = mon_sess.run(predictions)
if not isinstance(predictions, dict):
return preds_evaluated
else:
preds = []
for i in range(self._extract_batch_length(preds_evaluated)):
preds.append({
key: value[i]
for key, value in six.iteritems(preds_evaluated)
})
return preds
def finish():
mon_sess.close()
return predict, finish
And here is the rough code to use it:
import tensorflow as tf
from async_estimator import AsyncEstimator
def doPrediction(model_fn, model_dir, max_seq_length):
estimator = AsyncEstimator(model_fn, model_dir=model_dir)
predict, finish = estimator.async_predictor(dtype=tf.int32, shape=(1, max_seq_length))
output = None
while True:
# my input is dependent on the previous output
x = get_numpy_data(output)
if x is None:
break
output = predict(x)
save_to_disk(output)
finish()
Note: this is a simple solution which works for my needs, it may need to be modified for other cases. It is working on TensorFlow 1.2.1.
Hopefully TF will officially adopt something like this to make serving dynamic predictions with Estimator easier.

Tensorflow, feeding Estimator.fit(batch)

Could you provide an example of using the high-level API Estimators with placeholders and feeding batches like for a basic use:
for step in xrange(max_steps):
batch_of_inputs,batch_of_targets= get_batch_from_disk(step)# e.g.batches are stored as list where step is and index of the list
feed_dict = {x:batch_of_inputs,y:batch_of_targets}
_, loss_value = sess.run([train_op, loss],
feed_dict=feed_dict)
How to do the same with Estimator API?
Estimator takes batch_size, steps, input_fuc or feed_fun as an argument of the fit function (see doc https://www.tensorflow.org/versions/master/api_docs/python/contrib.learn/estimators) but it is not clear for me how to implement a function which will load data as a batch from e.g. disk?
I don't think whether estimators are really meant to be used with placeholders. They use the concept of input_fn which is properly described here.
If you realy need to use a placeholder you might use a FeedFnHook:
def input_fn(): # empty input_fn, returns features and labels
return {}, {}
feed_dict = {x:batch_of_inputs,y:batch_of_targets}
def feed_fn(): # feed_fn with hardcoded feed_dict
return feed_dict
hooks = [tf.train.FeedFnHook(feed_fn=feed_fn)]
estimator.train(input_fn=input_fn, hooks=hooks, steps=1)

Categories

Resources