Suppose I have two inputs (each with a number of features), that I want to feed into a Dropout layer. I want each iteration to drop out a whole input, with all of its associated features, and keep the whole of the other input.
After concatenating the inputs, I think I need to use the noise_shape parameter for Dropout, but the shape of the concatenated layer doesn't really let me do that. For two inputs of shape (15,), the concatenated shape is (None, 30), rather than (None, 15, 2), so one of the axes is lost and I can't drop out along it.
Any suggestions for what I could do? Thanks.
from keras.layers import Input, concatenate, Dense, Dropout
x = Input((15,)) # 15 features for the 1st input
y = Input((15,)) # 15 features for the 2nd input
xy = concatenate([x, y])
print(xy._keras_shape)
# (None, 30)
layer = Dropout(rate=0.5, noise_shape=[xy.shape[0], 1])(xy)
...
EDIT :
Seems like I misunderstood your question, here is the updated answer based on your requirement.
To achieve what you want, x and y effectively become the timesteps, and according to Keras documentation, noise_shape=(batch_size, 1, features) if your input shape is (batch_size, timesteps, features):
x = Input((15,1)) # 15 features for the 1st input
y = Input((15,1)) # 15 features for the 2nd input
xy = concatenate([x, y])
dropout_layer = Dropout(rate=0.5, noise_shape=[None, 1, 2])(xy)
...
To test that you are getting the correct behavior, you can inspect the intermediate xy layer and dropout_layer using the following code (reference link):
### Define your model ###
from keras.layers import Input, concatenate, Dropout
from keras.models import Model
from keras import backend as K
# Learning phase must be set to 1 for dropout to work
K.set_learning_phase(1)
x = Input((15,1)) # 15 features for the 1st input
y = Input((15,1)) # 15 features for the 2nd input
xy = concatenate([x, y])
dropout_layer = Dropout(rate=0.5, noise_shape=[None, 1, 2])(xy)
model = Model(inputs=[x,y], output=dropout_layer)
# specify inputs and output of the model
x_inp = model.input[0]
y_inp = model.input[1]
outp = [layer.output for layer in model.layers[2:]]
functor = K.function([x_inp, y_inp], outp)
### Get some random inputs ###
import numpy as np
input_1 = np.random.random((1,15,1))
input_2 = np.random.random((1,15,1))
layer_outs = functor([input_1,input_2])
print('Intermediate xy layer:\n\n',layer_outs[0])
print('Dropout layer:\n\n', layer_outs[1])
You should see that the entire x or y are dropped randomly (50% chance) per your requirement:
Intermediate xy layer:
[[[0.32093528 0.70682645]
[0.46162075 0.74063486]
[0.522718 0.22318116]
[0.7897043 0.7849486 ]
[0.49387926 0.13929296]
[0.5754296 0.6273373 ]
[0.17157765 0.92996144]
[0.36210892 0.02305864]
[0.52637625 0.88259524]
[0.3184462 0.00197006]
[0.67196816 0.40147918]
[0.24782693 0.5766827 ]
[0.25653633 0.00514544]
[0.8130438 0.2764429 ]
[0.25275478 0.44348967]]]
Dropout layer:
[[[0. 1.4136529 ]
[0. 1.4812697 ]
[0. 0.44636232]
[0. 1.5698972 ]
[0. 0.2785859 ]
[0. 1.2546746 ]
[0. 1.8599229 ]
[0. 0.04611728]
[0. 1.7651905 ]
[0. 0.00394012]
[0. 0.80295837]
[0. 1.1533654 ]
[0. 0.01029088]
[0. 0.5528858 ]
[0. 0.88697934]]]
If you are wondering why all the elements are multiplied by 2, take a look at how tensorflow implemented dropout here.
Hope this helps.
Related
Can someone please explain dimensionality logic for input X and class Y
for sparse_categorical_crossentropy loss function ?
I checked both Keras and tf2 doc and examples, and this post.
Cross Entropy vs Sparce but one point is not clear to me.
Does the Y vector need to be expanded to the same number column as
the number classes models outputs (if I use softmax output), or
Does Keras automatically expand Y?
In my case, I have input images 32x32, and Y is a number between 0 and 10.
So the input is (batch_size, h, w), Y (batch_size, 0....10 integer value)
X = (73257, 32, 32)
Y = (73257, 1)
model.fit(X, Y, epochs=30, validation_split=0.10, batch_size=1, verbose=True)
The model itself just a Sequential bunch of Dense layers and output Softmax.
model = Sequential()
model.add(Dense(32, activation='relu',
input_shape=input_shape,
kernel_initializer='he_uniform',
bias_initializer='ones'))
# bunch of Dense layer and output softmax
model.add(Dense(10, activation='softmax'))
The error is dimensionality.
ValueError: Shape mismatch: The shape of labels (received (1, 1)) should equal the shape of logits except for the last dimension (received (1, 32, 10)).
Thank you.
As mentioned in that post, both categorical cross-entropy (cce) and sparse categorical cross-entropy (scc) have the same loss function just except the format of the true label Y. Simply if Y is an integer, you would use scc whereas if Y is one-hot, you would use cce. So for scc, ground truth Y is mostly 1D whereas in cce, ground truth Y mostly is 2D. For ground truth
- (num_of_samples, n_class_one_hot_encode) <- for cce (2D)
- (num_of_samples, n_class_int) <- for scc (1D)
For example, if we use the cifar10 data set, we can do
import tensorflow as tf
(x_train, y_train), (_, _) = tf.keras.datasets.cifar10.load_data()
# train set / data
x_train = x_train.astype('float32') / 255
sparse = y_train
onehot = y_train
onehot = tf.keras.utils.to_categorical(onehot , num_classes=10)
print(sparse[:5]) # < --- (num_of_samples, n_class_int)
print(onehot[:5]) # < --- (num_of_samples, n_class_one_hot_encode)
[[6]
[9]
[9]
[4]
[1]]
[[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
[0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
[0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]]
Now, let's define a simple model and train using the above both and see what happens.
def net():
input = tf.keras.Input(shape=(32, 32, 3))
x = tf.keras.layers.Conv2D(16, 3, activation="relu")(input)
x = tf.keras.layers.MaxPooling2D(3)(x)
x = tf.keras.layers.GlobalMaxPooling2D()(x)
x = tf.keras.layers.Dense(10, activation='softmax')(x)
model = tf.keras.Model(input, x)
return model
Using cce
model = net()
model.compile(
loss = tf.keras.losses.CategoricalCrossentropy(),
metrics = 'accuracy',
optimizer = 'adam')
his = model.train_on_batch(x_train, onehot, return_dict=True)
print(his)
{'loss': 2.376708984375, 'accuracy': 0.09651999920606613}
one_hot_pred = model.predict(x_train)
print(onehot[0])
print(one_hot_pred[0])
print(onehot[0].shape)
print(one_hot_pred[0].shape)
[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
[0.1516315 0.1151238 0.11732318 0.10644271 0.08946694 0.1398355
0.05046898 0.04249624 0.11813554 0.06907552]
(10,)
(10,)
Now, using scc
model = net()
model.compile(
loss = tf.keras.losses.SparseCategoricalCrossentropy(),
metrics = 'accuracy',
optimizer = 'adam')
his = model.train_on_batch(x_train, sparse, return_dict=True)
print(his)
{'loss': 2.331458806991577, 'accuracy': 0.10066000372171402}
sparse_pred = model.predict(x_train)
print(sparse[0])
print(sparse_pred[0])
print(sparse[0].shape)
print(sparse_pred[0].shape)
[6]
[0.07184976 0.08837385 0.06910037 0.12347631 0.09542189 0.09981853
0.11247937 0.06707954 0.14902702 0.12337337]
(1,)
(10,)
Observe that, gt and pred shape for scc are (1,) and (10,). In this case, the loss computes the logarithm only for output index which ground truth indicates to. For example, the gt here is 6, and from pred the loss will compute only the logarithm of pred[6]. Here are some little more details of it.
It's easy to set a tensor untrainable, trainable=False. But Could I set only part of a tensor untrainable?
Suppose I have a 2*2 tensor, I only want one element untrainable and the other three elements trainable.
Like this (I want the 1,1 element always to be zero, and the other three elements updated by optimizer)
untrainable trainable
trainable trainable
Thanks.
Short answer: you can't.
Longer answer: you can mimic that effect by setting part of the gradient to zero after the computation of the gradient so that part of the variable is never updated.
Here is an example:
import tensorflow as tf
tf.random.set_seed(0)
model = tf.keras.Sequential([tf.keras.layers.Dense(2, activation="sigmoid", input_shape=(2,), name="first"), tf.keras.layers.Dense(1,activation="sigmoid")])
X = tf.random.normal((1000,2))
y = tf.reduce_sum(X, axis=1)
ds = tf.data.Dataset.from_tensor_slices((X,y))
In that example, the first layer has a weight W of the following:
>>> model.get_layer("first").trainable_weights[0]
<tf.Variable 'first/kernel:0' shape=(2, 2) dtype=float32, numpy=
array([[ 0.13573623, -0.68269 ],
[ 0.8938798 , 0.6792033 ]], dtype=float32)>
We then write the custom loop that will only update the first row of that weight W :
loss = tf.losses.MSE
opt = tf.optimizers.SDG(1.) # high learning rate to see the change
for xx,yy in ds.take(1):
with tf.GradientTape() as tape:
l = loss(model(xx),yy)
g = tape.gradient(l,model.get_layer("first").trainable_weights[0])
gradient_slice = g[:1] # first row
new_grad = tf.concat([gradient_slice, tf.zeros((1,2), dtype=tf.float32),], axis=0) # replacing the rest with zeros
opt.apply_gradients(zip([new_grad], [model.get_layer("first").trainable_weights[0]]))
And then, after running that loop, we can inspect the wieghts again:
model.get_layer("first").trainable_weights[0]
<tf.Variable 'first/kernel:0' shape=(2, 2) dtype=float32, numpy=
array([[-0.08515069, -0.51738167],
[ 0.8938798 , 0.6792033 ]], dtype=float32)>
And only the first row changed.
I am getting an error on fit_generator. My generator returns the following:
yield(row.values, label)
For example, using it:
myg = generate_array()
for i in myg:
print((i[0].shape))
print(i)
break
(9008,)
(array([0.116516, 0.22419 , 0.03373 , ..., 0. , 0. , 0. ]), 0)
But the following throws an exception:
model = Sequential()
model.add(Dense(84, activation='relu', input_dim=9008))
ValueError: Error when checking input: expected dense_1_input to have shape
(9008,) but got array with shape (1,)
Any idea?
As suggested by Kota Mori: data generator needs to give a batch of data, not a single sample. See e.g.: https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly
Since I want a stochastic gradient descent (batch size is one) the following code fixed the problem:
def generate_array():
while True:
X = np.empty((1, 9008))
y = np.empty((1), dtype=int)
# Some processing
X[0] = row
y[0] = label
yield(X,y)
My task is to predict the five most probable tags in a sentence. And now I've got unscaled logits from the output(dense connect) layer:
with tf.name_scope("output"):
scores = tf.nn.xw_plus_b(self.h_drop, W,b, name="scores")
predictions = tf.nn.top_k(self.scores, 5) # should be the k highest score
with tf.name_scope("accuracy"):
labels = input_y # its shape is (batch_size, num_classes)
# calculate the top k accuracy
now predictions are just like [3,1,2,50,12] (3,1... are indexes of the highest scores), while labels are in "multi-hot" form: [0,1,0,1,1,0...].
In python, i can simply write
correct_preds = [input_y[i]==1 for i in predictions]
weighted = np.dot(correct_preds, [5,4,3,2,1]) # weighted by rank
recall = sum(correct_preds) /sum(input_y)
precision =sum(correct_preds)/len(correct_preds)
but in tensorflow, what form shoud I use to complete this task?
Solution
I've coded up an example of how to do the calculations. All of the inputs in this example are coded as tf.constant but of course you can substitute your variables.
The main trick is the matrix multiplications. First is input_y reshaped to be 2d times a [1x5] ones matrix called to_top5. The second is correct_preds by the weighted_matrix.
Code
import tensorflow as tf
input_y = tf.constant( [5,2,9,1] , dtype=tf.int32 )
predictions = tf.constant( [[9,3,5,2,1],[8,9,0,6,5],[1,9,3,4,5],[1,2,3,4,5]])
to_top5 = tf.constant( [[1,1,1,1,1]] , dtype=tf.int32 )
input_y_for_top5 = tf.matmul( tf.reshape(input_y,[-1,1]) , to_top5 )
correct_preds = tf.cast( tf.equal( input_y_for_top5 , predictions ) , dtype=tf.float16 )
weighted_matrix = tf.constant( [[5.],[4.],[3.],[2.],[1.]] , dtype=tf.float16 )
weighted = tf.matmul(correct_preds,weighted_matrix)
recall = tf.reduce_sum(correct_preds) / tf.cast( tf.reduce_sum(input_y) , tf.float16)
precision = tf.reduce_sum(correct_preds) / tf.constant(5.0,dtype=tf.float16)
## training
# Run tensorflow and print the result
with tf.Session() as sess:
print "\n\n=============\n\n"
print "\ninput_y_for_top5"
print sess.run(input_y_for_top5)
print "\ncorrect_preds"
print sess.run(correct_preds)
print "\nweighted"
print sess.run(weighted)
print "\nrecall"
print sess.run(recall)
print "\nprecision"
print sess.run(precision)
print "\n\n=============\n\n"
Output
=============
input_y_for_top5
[[5 5 5 5 5]
[2 2 2 2 2]
[9 9 9 9 9]
[1 1 1 1 1]]
correct_preds
[[ 0. 0. 1. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 1. 0. 0. 0.]
[ 1. 0. 0. 0. 0.]]
weighted
[[ 3.]
[ 0.]
[ 4.]
[ 5.]]
recall
0.17651
precision
0.6001
=============
Summary
The above examples shows a batch size of 4.
The first batch has a y_label of 5, which means that the element with an index of 5 is the correct label for the first batch. Furthermore, the prediction for the first batch is [9,3,5,2,1] which means that the prediction function thinks that the 9th element is the most likely, then element 3 is the next most likely and so on.
Let's say we want an example of a batch size of 3, then use the following code
input_y = tf.constant( [5,2,9] , dtype=tf.int32 )
predictions = tf.constant( [[9,3,5,2,1],[8,9,0,6,5],[1,9,3,4,5]])
If we substitute in the above lines to the program we can see that indeed it calculates everything for a batch size of 3 correctly.
inspired by #wontonimo' answer above, I implemented a method using matrix ops and tf.reshape, tf.gather. The label tensor are "multi-hot", e.g. [[0,1,0,1],[1,0,0,1]]. prediction tensor are obtained by tf.nn.top_k, looks like [[3,1],[0,1]]. Here is the code:
top_k_pred = tf.nn.top_k(logits, 5)
tmp1 = tf.reshape(tf.range(batch_size)*num_classes, (-1,1))
idx_incre = top_k_pred[1] + tf.concat([tmp1]*5,1)
correct_preds = tf.gather(tf.reshape(y_label, (-1,), tf.reshape(idx_incre, (-1,)))
correct_preds = tf.reshape(correct_pred, (batch_size, 5))
weighted = correct_preds * [[5],[4],[3],[2],[1]]
I'd like to reset (randomize) the weights of all layers in my Keras (deep learning) model. The reason is that I want to be able to train the model several times with different data splits without having to do the (slow) model recompilation every time.
Inspired by this discussion, I'm trying the following code:
# Reset weights
for layer in KModel.layers:
if hasattr(layer,'init'):
input_dim = layer.input_shape[1]
new_weights = layer.init((input_dim, layer.output_dim),name='{}_W'.format(layer.name))
layer.trainable_weights[0].set_value(new_weights.get_value())
However, it only partly works.
Partly, becuase I've inspected some layer.get_weights() values, and they seem to change. But when I restart the training, the cost values are much lower than the initial cost values on the first run. It's almost like I've succeeded resetting some of the weights, but not all of them.
Save the initial weights right after compiling the model but before training it:
model.save_weights('model.h5')
and then after training, "reset" the model by reloading the initial weights:
model.load_weights('model.h5')
This gives you an apples to apples model to compare different data sets and should be quicker than recompiling the entire model.
Reset all layers by checking for initializers:
def reset_weights(model):
import keras.backend as K
session = K.get_session()
for layer in model.layers:
if hasattr(layer, 'kernel_initializer'):
layer.kernel.initializer.run(session=session)
if hasattr(layer, 'bias_initializer'):
layer.bias.initializer.run(session=session)
Update: kernel_initializer is kernel.initializer now.
If you want to truly re-randomize the weights, and not merely restore the initial weights, you can do the following. The code is slightly different depending on whether you're using TensorFlow or Theano.
from keras.initializers import glorot_uniform # Or your initializer of choice
import keras.backend as K
initial_weights = model.get_weights()
backend_name = K.backend()
if backend_name == 'tensorflow':
k_eval = lambda placeholder: placeholder.eval(session=K.get_session())
elif backend_name == 'theano':
k_eval = lambda placeholder: placeholder.eval()
else:
raise ValueError("Unsupported backend")
new_weights = [k_eval(glorot_uniform()(w.shape)) for w in initial_weights]
model.set_weights(new_weights)
I have found the clone_model function that creates a cloned network with the same architecture but new model weights.
Example of use:
model_cloned = tensorflow.keras.models.clone_model(model_base)
Comparing the weights:
original_weights = model_base.get_weights()
print("Original weights", original_weights)
print("========================================================")
print("========================================================")
print("========================================================")
model_cloned = tensorflow.keras.models.clone_model(model_base)
new_weights = model_cloned.get_weights()
print("New weights", new_weights)
If you execute this code several times, you will notice that the cloned model receives new weights each time.
Tensorflow 2 answer:
for ix, layer in enumerate(model.layers):
if hasattr(model.layers[ix], 'kernel_initializer') and \
hasattr(model.layers[ix], 'bias_initializer'):
weight_initializer = model.layers[ix].kernel_initializer
bias_initializer = model.layers[ix].bias_initializer
old_weights, old_biases = model.layers[ix].get_weights()
model.layers[ix].set_weights([
weight_initializer(shape=old_weights.shape),
bias_initializer(shape=old_biases.shape)])
Original weights:
model.layers[1].get_weights()[0][0]
array([ 0.4450057 , -0.13564804, 0.35884023, 0.41411972, 0.24866664,
0.07641453, 0.45726687, -0.04410008, 0.33194816, -0.1965386 ,
-0.38438258, -0.13263905, -0.23807487, 0.40130925, -0.07339832,
0.20535922], dtype=float32)
New weights:
model.layers[1].get_weights()[0][0]
array([-0.4607593 , -0.13104361, -0.0372932 , -0.34242013, 0.12066692,
-0.39146423, 0.3247317 , 0.2635846 , -0.10496247, -0.40134245,
0.19276887, 0.2652442 , -0.18802321, -0.18488845, 0.0826562 ,
-0.23322225], dtype=float32)
K.get_session().close()
K.set_session(tf.Session())
K.get_session().run(tf.global_variables_initializer())
Try set_weights.
for example:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import print_function
import numpy as np
np.random.seed(1234)
from keras.layers import Input
from keras.layers.convolutional import Convolution2D
from keras.models import Model
print("Building Model...")
inp = Input(shape=(1,None,None))
x = Convolution2D(1, 3, 3, border_mode='same', init='normal',bias=False)(inp)
output = Convolution2D(1, 3, 3, border_mode='same', init='normal',bias=False)(x)
model_network = Model(input=inp, output=output)
w = np.asarray([
[[[
[0,0,0],
[0,2,0],
[0,0,0]
]]]
])
for layer_i in range(len(model_network.layers)):
print (model_network.layers[layer_i])
for layer_i in range(1,len(model_network.layers)):
model_network.layers[layer_i].set_weights(w)
input_mat = np.asarray([
[[
[1.,2.,3.,10.],
[4.,5.,6.,11.],
[7.,8.,9.,12.]
]]
])
print("Input:")
print(input_mat)
print("Output:")
print(model_network.predict(input_mat))
w2 = np.asarray([
[[[
[0,0,0],
[0,3,0],
[0,0,0]
]]]
])
for layer_i in range(1,len(model_network.layers)):
model_network.layers[layer_i].set_weights(w2)
print("Output:")
print(model_network.predict(input_mat))
build a model with say, two convolutional layers
print("Building Model...")
inp = Input(shape=(1,None,None))
x = Convolution2D(1, 3, 3, border_mode='same', init='normal',bias=False)(inp)
output = Convolution2D(1, 3, 3, border_mode='same', init='normal',bias=False)(x)
model_network = Model(input=inp, output=output)
then define your weights (i'm using a simple w, but you could use np.random.uniform or anything like that if you want)
w = np.asarray([
[[[
[0,0,0],
[0,2,0],
[0,0,0]
]]]
])
Take a peek at what are the layers inside a model
for layer_i in range(len(model_network.layers)):
print (model_network.layers[layer_i])
Set each weight for each convolutional layer (you'll see that the first layer is actually input and you don't want to change that, that's why the range starts from 1 not zero).
for layer_i in range(1,len(model_network.layers)):
model_network.layers[layer_i].set_weights(w)
Generate some input for your test and predict the output from your model
input_mat = np.asarray([
[[
[1.,2.,3.,10.],
[4.,5.,6.,11.],
[7.,8.,9.,12.]
]]
])
print("Output:")
print(model_network.predict(input_mat))
You could change it again if you want and check again for the output:
w2 = np.asarray([
[[[
[0,0,0],
[0,3,0],
[0,0,0]
]]]
])
for layer_i in range(1,len(model_network.layers)):
model_network.layers[layer_i].set_weights(w2)
print("Output:")
print(model_network.predict(input_mat))
Sample output:
Using Theano backend.
Building Model...
<keras.engine.topology.InputLayer object at 0x7fc0c619fd50>
<keras.layers.convolutional.Convolution2D object at 0x7fc0c6166250>
<keras.layers.convolutional.Convolution2D object at 0x7fc0c6150a10>
Weights after change:
[array([[[[ 0., 0., 0.],
[ 0., 2., 0.],
[ 0., 0., 0.]]]], dtype=float32)]
Input:
[[[[ 1. 2. 3. 10.]
[ 4. 5. 6. 11.]
[ 7. 8. 9. 12.]]]]
Output:
[[[[ 4. 8. 12. 40.]
[ 16. 20. 24. 44.]
[ 28. 32. 36. 48.]]]]
Output:
[[[[ 9. 18. 27. 90.]
[ 36. 45. 54. 99.]
[ 63. 72. 81. 108.]]]]
From your peek at .layers you can see that the first layer is input and the others your convolutional layers.
For tf2 the simplest way to actually reset weights would be:
tf_model.set_weights(
clone_model(tf_model).get_weights()
)
clone_model() as mentioned by #danielsaromo returns new model with trainable params initialized from scratch, we use its weights to reinitialize our model thus no model compilation (knowledge about its loss or optimizer) is needed.
There are two caveats though, first is mentioned in clone_model()'s documentation:
clone_model will not preserve the uniqueness of shared objects within the model (e.g. a single variable attached to two distinct layers will be restored as two separate variables).
Another caveat is that for large models cloning might fail due to memory limit.
To "random" re-initialize weights of a compiled untrained model in TF 2.0 (tf.keras):
weights = [glorot_uniform(seed=random.randint(0, 1000))(w.shape) if w.ndim > 1 else w for w in model.get_weights()]
Note the "if wdim > 1 else w". You don't want to re-initialize the biases (they stay 0 or 1).
use keras.backend.clear_session()