Including deviation from expected model output in a cost function

Including deviation from expected model output in a cost function - python

I'm trying to write a simple model to do the following. I want my model to take input and then do a normal feedforward output with linear activation (K.bias_add(K.dot(x, self.kernel), self.bias)) with the kernel and bias vectors as trainable parameters. I also want the model to include a parameter x_out (also trainable) of the estimated output. The goal is for the model to output the parameter x_out as output, but to included norm(W*x + b - x_out) as a penalty term in the cost function. I've tried doing this with a wrapper function for a custom loss, but have not succeeded so far.
from keras import backend as K
from keras.layers import Layer
class MyFeedforward(Layer):
def __init__(self, output_dim, **kwargs):
self.output_dim = output_dim
super(MyFeedforward, self).__init__(**kwargs)
def build(self, input_shape):
# Create a trainable weight variable for this layer.
self.kernel = self.add_weight(name='kernel',
shape=(input_shape[1], self.output_dim),
initializer='uniform',
trainable=True)
self.bias = self.add_weight(name='bias',
shape=(self.output_dim,),
initializer='uniform',
trainable=True)
super(MyFeedforward, self).build(input_shape) # Be sure to call this at the end
def call(self, x):
#return K.bias_add(K.dot(x, self.zeros), self.output_estimate)
return K.bias_add(K.dot(x, self.kernel), self.bias)
def compute_output_shape(self, input_shape):
return (input_shape[0], self.output_dim)
from keras import backend as K
from keras.layers import Layer
class MyLayer(Layer):
def __init__(self, output_dim, **kwargs):
self.output_dim = output_dim
super(MyLayer, self).__init__(**kwargs)
def build(self, input_shape):
# Create a trainable weight variable for this layer.
self.zeros = self.add_weight(name='zeros',
shape=(input_shape[1], self.output_dim),
initializer='Zeros',
trainable=False)
self.output_estimate = self.add_weight(name='output_estimate',
shape=(self.output_dim,),
initializer='uniform',
trainable=True)
super(MyLayer, self).build(input_shape) # Be sure to call this at the end
def call(self, x):
return K.bias_add(K.dot(x, self.zeros), self.output_estimate)
#return K.bias_add(K.dot(x, self.kernel), self.bias)
def compute_output_shape(self, input_shape):
return (input_shape[0], self.output_dim)
from keras.models import Model
from keras import layers
from keras import Input
input_tensor = layers.Input(shape=(784,))
output_estimate = MyLayer(10,)(input_tensor)
calculated_output = MyFeedforward(10,)(input_tensor)
model = Model(input_tensor, [calculated_output, output_estimate])
model.summary()
My hope is to have both the MyFeedforward layer (computes K.bias_add(K.dot(x, kernel), bias) ) and the MyLayer (simply outputs the estimated output x_out) used in the cost function. The ultimate goal is to optimize something like categorical_crossentropy between the training data and the output of x_out, with the squared deviation of x_out and the real output (K.sum(K.square(x_out - K.bias_add(K.dot(x, kernel), bias)). If I can do this without setting up separate layers this would work equally well.
I've tried setting this up with a wrapper for a custom loss function and a multi-head model (one for the x_out layer, one for the normal layer) but it appears that I end up not having access to the normal layer's output when I do this. (I need y_pred from another layer, or I need access to the input so that I can grab layer.kernel and layer.bias and apply them to the input for the computation).

Related

How I convert tensoflow Linear(kernel_constraint=max_norm) to pytorch code?

Dense(self.latent_dim, kernel_constraint=max_norm(0.5))(en_conv)
I want to convert the above tensoflow code to pytorch, but I don't understand kernel_constraint=max_norm(0.5). How can I convert it?

one way possible is to do it by a custom layer that you can use in the model as a custom layer. Kernel constrain is the same as you do by initializing the value in the simple Dense layer.
Sample: Dense layer with initial weight, you can use tf.zeros() or tf.ones() or random function or tf.constant() but the model training result does not always converge at the single points. To find possibilities you need to initial it from specific but running you may start from trained values.
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Simply Dense
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
class SimpleDense(tf.keras.layers.Layer):
def __init__(self, units=32):
super(SimpleDense, self).__init__()
self.units = units
def build(self, input_shape):
self.w = self.add_weight(shape=(input_shape[-1], self.units),
initializer='random_normal',
trainable=True)
self.b = self.add_weight(shape=(self.units,),
initializer='random_normal',
trainable=True)
def call(self, inputs):
return tf.matmul(inputs, self.w) + self.b
Sample: As the question requirements, Dense layer with an initializer of the MaxNorm constrain.
import tensorflow as tf
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
None
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
physical_devices = tf.config.experimental.list_physical_devices('GPU')
assert len(physical_devices) > 0, "Not enough GPU hardware devices available"
config = tf.config.experimental.set_memory_growth(physical_devices[0], True)
print(physical_devices)
print(config)
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Class / Funtions
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
class MaxNorm(tf.keras.layers.Layer):
def __init__(self, max_value=2, axis=1):
super(MaxNorm, self).__init__()
# self.units = units
self._out_shape = None
self.max_value = max_value
self.axis = axis
def build(self, input_shape):
self._out_shape = input_shape
def call(self, inputs):
temp = tf.keras.layers.Dense(inputs.shape[1], kernel_constraint=tf.keras.constraints.MaxNorm(max_value=self.max_value, axis=self.axis), activation=None)( inputs )
return temp
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Tasks
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
temp = tf.constant([[ 0.00346701, -0.00676209, -0.00109781, -0.0005832 , 0.00047849, 0.00311204, 0.00843922, -0.00400238, 0.00127922, -0.0026469 ,
-0.00232184, -0.00686269, 0.00021552, -0.0039388 , 0.00753652,
-0.00405236, -0.0008759 , 0.00275771, 0.00144688, -0.00361056,
-0.0036177 , 0.00778807, -0.00116923, 0.00012773, 0.00276652,
0.00438983, -0.00769166, -0.00432891, -0.00211244, -0.00594028,
0.01009954, 0.00581804, -0.0062736 , -0.00921499, 0.00710281,
0.00022364, 0.00051054, -0.00204145, 0.00928543, -0.00129213,
-0.00209933, -0.00212295, -0.00452125, -0.00601313, -0.00239222,
0.00663724, 0.00228883, 0.00359715, 0.00090024, 0.01166699,
-0.00281386, -0.00791688, 0.00055902, 0.00070648, 0.00052972,
0.00249906, 0.00491098, 0.00528313, -0.01159694, -0.00370812,
-0.00950641, 0.00408999, 0.00800613, 0.0014898 ]], dtype=tf.float32)
layer = MaxNorm(max_value=2)
print( layer( temp )[0][tf.math.argmax(layer( temp )[0]).numpy()] )
layer = MaxNorm(max_value=4)
print( layer( temp )[0][tf.math.argmax(layer( temp )[0]).numpy()] )
layer = MaxNorm(max_value=10)
print( layer( temp )[0][tf.math.argmax(layer( temp )[0]).numpy()] )
Output: The custom modified creation of a new layer, one way to prove the answer is initial from near zero or where you know about results. Starting from zero you pay attention in less vary but none zeros you do most at the magnitudes of the process.
tf.Tensor(-0.8576179, shape=(), dtype=float32)
tf.Tensor(0.6010429, shape=(), dtype=float32)
tf.Tensor(2.2286513, shape=(), dtype=float32)

tensorflow autodiff slower than pytorch's counterpart

I am using tensorflow 2.0 and trying to evaluate gradients for backpropagating to a simple feedforward neural network. Here's how my model looks like:
def __init__(self, input_size, output_size):
inputs = tf.keras.Input(shape=(input_size,))
hidden_layer1 = tf.keras.layers.Dense(30, activation='relu')(inputs)
outputs = tf.keras.layers.Dense(output_size)(hidden_layer1)
self.model = tf.keras.Model(inputs=inputs, outputs=outputs)
self.optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
self.loss_function = tf.keras.losses.Huber()
The forward pass to this network is fine but when I use gradient tape to train the model, it is at least 10x slower than PyTorch.
Training function:
def learn_modified_x(self, inputs, targets, actions):
with tf.GradientTape() as tape:
predictions = self.model(inputs)
predictions_for_action = gather_single_along_axis(predictions, actions)
loss = self.loss_function(targets, predictions_for_action)
grads = tape.gradient(loss, self.model.trainable_weights)
self.optimizer.apply_gradients(zip(grads, self.model.trainable_weights))
I tried commenting lines to find what is actually causing the problem. I discovered that tape.gradient is a significant contributor to this situation.
Any idea?
PyTorch implementation
def __init__(self, input_size, nb_action):
super(Network, self).__init__()
self.input_size = input_size
self.nb_action = nb_action
self.fc1 = nn.Linear(input_size, 30)
self.fc2 = nn.Linear(30, nb_action)
def forward(self, state):
x = F.relu(self.fc1(state))
q_values = self.fc2(x)
return q_values
def learn(self, batch_state, batch_next_state, batch_reward, batch_action):
outputs = self.model(batch_state).gather(1, batch_action.unsqueeze(1)).squeeze(1)
next_outputs = self.model(batch_next_state).detach().max(1)[0]
target = self.gamma*next_outputs + batch_reward
td_loss = F.smooth_l1_loss(outputs, target)
self.optimizer.zero_grad()
td_loss.backward(retain_variables = True)
self.optimizer.step()

def __init__(self,...):
...
self.model.call = tf.function(self.model.call)
...
you need use tf.function to wrap your model's call function.

How to convert this code from Keras to Tensorflow?

I am trying to convert code from Keras to tensorflow, I don't have much idea about Keras api, I am a Tensorflow user, Here is Keras code :
rawmeta = layers.Input(shape=(1,), dtype="string")
emb = elmolayer()(rawmeta)
d1 = layers.Dense(256, activation='relu')(emb)
yhat = layers.Dense(31, activation='softmax', name = "output_node")(d1)
model = Model(inputs=[rawmeta], outputs=yhat)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
Where elmolayer defined as follows :
class elmolayer(Layer):
def __init__(self, **kwargs):
self.dimensions = 1024
self.trainable=True
super(elmolayer, self).__init__(**kwargs)
def build(self, input_shape):
self.elmo = hub.Module('https://tfhub.dev/google/elmo/2', trainable=self.trainable,
name="{}_module".format(self.name))
self.trainable_weights += K.tf.trainable_variables(scope="^{}_module/.*".format(self.name))
super(elmolayer, self).build(input_shape)
def call(self, x, mask=None):
result = self.elmo(K.squeeze(K.cast(x, tf.string), axis=1),
as_dict=True,
signature='default',
)['default']
return result
def compute_mask(self, inputs, mask=None):
return K.not_equal(inputs, '--PAD--')
def compute_output_shape(self, input_shape):
return (input_shape[0], self.dimensions)
My Tensorflow implementation of this code is :
class Base_model(object):
def __init__(self, elmo_embedding_matrix):
tf.reset_default_graph()
# define placeholders
sentences = tf.placeholder(tf.int32, [None, None], name='sentences')
y_true = tf.placeholder(tf.int32, [None, None], name='labels' )
self.elmo = tf.get_variable(name="relation_embedding", shape=[elmo_embedding_matrix.shape[0],elmo_embedding_matrix.shape[1]],
initializer=tf.constant_initializer(np.array(elmo_embedding_matrix)),
trainable=True,dtype=tf.float32)
embedding_lookup = tf.nn.embedding_lookup(self.elmo,sentences)
d1 = tf.layers.dense(embedding_lookup, 256, tf.nn.relu)
y_pred = tf.layers.dense(d1, 31, tf.nn.softmax)
matches = tf.equal(tf.argmax(y_pred,1),tf.argmax(y_true,1))
acc = tf.reduce_mean(tf.cast(matches,tf.float32))
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=y_true,logits=y_pred))
train = tf.train.AdamOptimizer(learning_rate=0.001).minimize(cross_entropy)
My confusion is the last dense layer in keras model is :
yhat = layers.Dense(31, activation='softmax', name = "output_node")(d1)
While in tensorflow code if i am using tf.nn.softmax_cross_entropy_with_logits_v2 then should i pass second dense layer to softmax eg.,
y_pred = tf.layers.dense(d1, 31, tf.nn.softmax)
Because if i am using softmax here then tf.nn.softmax_cross_entropy_with_logits_v2 will use softmax again on logits.
How to convert that Keras code to Tensorflow?

Specifying the comment here (Answer Section) even though it is present in Comments Section, for the benefit of the Community.
The Tensorflow equivalent Code for the Keras Code to represent Output Layer,
yhat = layers.Dense(31, activation='softmax', name = "output_node")(d1)
is
y_logits = tf.layers.dense(d1, 31, tf.nn.softmax)
y_pred = tf.nn.softmax(y_logits)
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=y_true,logits=y_logits))
Hope this helps. Happy Learning!

How I can use keras layer neural network in data without label?(just with some inputs and no lables)

I want use keras for deep forward neural network but without using model fit. somthing like this but using keras not tf.layer:
h_1 = tf.layers.dense(inputs=inputs, units=self.n_1, activation=tf.nn.leaky_relu, kernel_regularizer=regularizer)
h_2 = tf.layers.dense(inputs=h_1, units=self.n_2, activation=tf.nn.leaky_relu, kernel_regularizer=regularizer)
h_3 = tf.layers.dense(inputs=h_2, units=self.n_3, activation=tf.nn.leaky_relu, kernel_regularizer=regularizer)
h_4 = tf.layers.dense(inputs=h_3, units=self.n_3, activation=tf.nn.leaky_relu, kernel_regularizer=regularizer)
out = tf.layers.dense(inputs=h_4, units=self.a_dim, activation=tf.nn.tanh, kernel_regularizer=regularizer)
this is class that I have built for keras neurL network:
from keras import backend as K
from keras.layers import Layer
from keras import activations
import tensorflow as tf
class actorLayer(Layer):
def __init__(self, output_dim,activation=None, **kwargs):
self.output_dim = output_dim
self.activation = activations.get(activation)
#self.batch_size = batch_size
super(actorLayer, self).__init__(**kwargs)
def build(self, input_shape):
# Create a trainable weight variable for this layer.
self.kernel = self.add_weight(name='kernel',
shape=(input_shape[1], self.output_dim),
initializer='uniform',
trainable=True)
super(actorLayer, self).build(input_shape) # Be sure to call this at the end
def call(self, x):
#print("----------")
#print(tf.shape(x))
#print(tf.shape(self.kernel))
#print("----------")
return self.activation(K.dot(x, self.kernel))
def compute_output_shape(self, input_shape):
return (input_shape, self.output_dim)
and this is how I use it:
x_in = Input(shape=(7,))
x = actorLayer(128,activation='relu')(x_in)
x = actorLayer(64,activation='relu')(x)
x = actorLayer(64,activation='relu')(x)
x = actorLayer(16)(x)
x = actorLayer(2)(x)
I need sth like tf.layer that I can use its output(out in above code)
I use model = Model(inputs = x_in,outputs = x) to build my model but I don not know how to use its output and return out and I can not use model fit.
I have used out = model(input) but it does not work. I use this NN for reinforcement learning and I need to return its output and then optimize its weights

Keras Custom Layer ValueError: An operation has `None` for gradient.

I have created a custom Keras Conv2D layer as follows:
class CustConv2D(Conv2D):
def __init__(self, filters, kernel_size, kernelB=None, activation=None, **kwargs):
self.rank = 2
self.num_filters = filters
self.kernel_size = conv_utils.normalize_tuple(kernel_size, self.rank, 'kernel_size')
self.kernelB = kernelB
self.activation = activations.get(activation)
super(CustConv2D, self).__init__(self.num_filters, self.kernel_size, **kwargs)
def build(self, input_shape):
if K.image_data_format() == 'channels_first':
channel_axis = 1
else:
channel_axis = -1
if input_shape[channel_axis] is None:
raise ValueError('The channel dimension of the inputs '
'should be defined. Found `None`.')
input_dim = input_shape[channel_axis]
num_basis = K.int_shape(self.kernelB)[-1]
kernel_shape = (num_basis, input_dim, self.num_filters)
self.kernelA = self.add_weight(shape=kernel_shape,
initializer=RandomUniform(minval=-1.0,
maxval=1.0, seed=None),
name='kernelA',
regularizer=self.kernel_regularizer,
constraint=self.kernel_constraint)
self.kernel = K.sum(self.kernelA[None, None, :, :, :] * self.kernelB[:, :, :, None, None], axis=2)
# Set input spec.
self.input_spec = InputSpec(ndim=self.rank + 2, axes={channel_axis: input_dim})
self.built = True
super(CustConv2D, self).build(input_shape)
I use the CustomConv2D as the first Conv layer of my model.
img = Input(shape=(width, height, 1))
l1 = CustConv2D(filters=64, kernel_size=(11, 11), kernelB=basis_L1, activation='relu')(img)
The model compiles fine; but gives me the following error while training.
ValueError: An operation has None for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
Is there a way to figure out which operation is throwing the error? Also, is there any implementation error in the way I am writing the custom layer?

You're destroying your build by calling the original Conv2D build (your self.kernel will be replaced, then self.kernelA will never be used, thus backpropagation will never reach it).
It's also expecting biases and all the regular stuff:
class CustConv2D(Conv2D):
def __init__(self, filters, kernel_size, kernelB=None, activation=None, **kwargs):
#...
#...
#don't use bias if you're not defining it:
super(CustConv2D, self).__init__(self.num_filters, self.kernel_size,
activation=activation,
use_bias=False, **kwargs)
#bonus: don't forget to add the activation to the call above
#it will also replace all your `self.anything` defined before this call
def build(self, input_shape):
#...
#...
#don't use bias:
self.bias = None
#consider the layer built
self.built = True
#do not destroy your build
#comment: super(CustConv2D, self).build(input_shape)

It may be because there are some weights in your code that are defined by not used in the calculation of the output. Thus its gradient wrt the loss is None/undefined.
A coded out example can be found here: https://github.com/keras-team/keras/issues/12521#issuecomment-496743146

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Including deviation from expected model output in a cost function - python

Related

How I convert tensoflow Linear(kernel_constraint=max_norm) to pytorch code?

tensorflow autodiff slower than pytorch's counterpart

How to convert this code from Keras to Tensorflow?

How I can use keras layer neural network in data without label?(just with some inputs and no lables)

Keras Custom Layer ValueError: An operation has `None` for gradient.

Categories

Resources