I have trained a model using Keras with tf as the backend as such:
activation = 'relu'
initializer = 'he_normal'
n_hidden = [256,128]
n_filters = [32]
input_shape = (batch_size,7213,1)
model = Sequential([
InputLayer(batch_input_shape=input_shape),
Convolution1D(nb_filter=n_filters[0], filter_length=8, activation=activation, border_mode='same', init=initializer, input_shape=input_shape),
MaxPooling1D(pool_length=4),
Flatten(),
Dense(output_dim=n_hidden[0], activation=activation, init=initializer),
Dense(output_dim=n_hidden[1], activation=activation, init=initializer),
Dense(output_dim=3, input_dim=n_hidden[1], activation='linear'),
])
I need to build a theano function that replicates model.predict() with my saved weights in order to return the Jacobian matrix of the outputs w.r.t. the inputs.
The following numpy code gives the same results as model.predict() :
pool_length=4
x_ = test_X_data.reshape(batch_size,7213)
weights_ = model.get_weights() #obtain model weights
#reshape CNN weights and bias weights
weights_[0] = np.reshape(weights_[0], (weights_[0].shape[0], weights_[0].shape[3]))
weights_[1] = np.reshape(weights_[1], (1, weights_[1].shape[0]))
weights_[3] = np.reshape(weights_[3], (1, weights_[3].shape[0]))
weights_[5] = np.reshape(weights_[5], (1, weights_[5].shape[0]))
weights_[7] = np.reshape(weights_[7], (1, weights_[7].shape[0]))
# pad left and right sides of input
x_padded = np.pad(x_, ((0, 0), (3, 4)), mode='constant')
# compute Conv1d layer with bias weights
prediction = np.zeros((x_.shape[0],x_.shape[1],weights_[0].shape[1]))
for i in range(x_.shape[1]):
prediction[:,i] = np.dot(x_padded[:,i:i+8],weights_[0])+weights_[1]
prediction = prediction
# RELU activation
prediction[prediction<0]=0
# Max pooling layer
pred_temp = np.zeros((prediction.shape[0],prediction.shape[1]/pool_length,prediction.shape[2]))
for i in range(prediction.shape[2]):
for j in range(prediction.shape[1]/pool_length):
pred_temp[:,j,i] = np.max(prediction[:,j*4:(j+1)*4,i],axis=1)
prediction = pred_temp.reshape(pred_temp.shape[0],pred_temp.shape[1]*pred_temp.shape[2])
# Dense layers
weights=np.vstack([weights_[2],weights_[3]])
prediction=np.hstack([prediction,np.zeros(prediction.shape[0]).reshape(prediction.shape[0],1)])
prediction[:,-1]=1
prediction=np.dot(prediction,weights)
prediction[prediction<0]=0
weights=np.vstack([weights_[4],weights_[5]])
prediction=np.hstack([prediction,np.zeros(prediction.shape[0]).reshape(prediction.shape[0],1)])
prediction[:,-1]=1
prediction=np.dot(prediction,weights)
prediction[prediction<0]=0
weights=np.vstack([weights_[6],weights_[7]])
prediction=np.hstack([prediction,np.zeros(prediction.shape[0]).reshape(prediction.shape[0],1)])
prediction[:,-1]=1
prediction= np.dot(prediction,weights)
Below is my attempt at turning this into a theano function to compute the Jacobian, but I believe the forloops are making it too slow to compile.
theano.config.optimizer='fast_compile'
theano.config.exception_verbosity='high'
weights_in_model = theano.typed_list.TypedListType(theano.tensor.dmatrix)()
x = T.matrix('x')
def pred_jac(x,weights_in_model):
pool_length=4
x = T.concatenate((T.zeros((T.shape(x)[0], 3)),x,T.zeros((T.shape(x)[0], 4))))
# Apply Convolution weights to input
prediction = []
for i in range(7213):
prediction.append(T.dot(x[:,i:i+8],weights_in_model[0])+weights_in_model[1])
prediction = T.as_tensor_variable(prediction)
prediction = T.clip(prediction, 0, 9999.) # RELU activation
prediction.dimshuffle(1,0,2) # Reformat to proper dimension order
# Maxpooling layer
pred_temp = []
for i in range(32):
pred_temp_b = []
for j in range(1803):
pred_temp_b.append(T.max(prediction[:,j*pool_length:(j+1)*pool_length,i],axis=1))
pred_temp.append(T.as_tensor_variable(pred_temp_b))
pred_temp = T.as_tensor_variable(pred_temp)
pred_temp.dimshuffle(1,2,0)
# Dense layers
prediction = T.reshape(pred_temp,(T.shape(pred_temp)[0],T.shape(pred_temp)[1]*T.shape(pred_temp)[2]))
weights = T.concatenate((weights_in_model[2],weights_in_model[3]), axis=0)
prediction = T.concatenate((prediction, T.ones((T.shape(prediction)[0], 1))), axis=1)
prediction = T.dot(prediction, weights)
prediction = T.clip(prediction, 0, 9999.)
weights = T.concatenate((weights_in_model[4],weights_in_model[5]), axis=0)
prediction = T.concatenate((prediction, T.ones((T.shape(prediction)[0], 1))), axis=1)
prediction = denormalize(T.dot(prediction, weights))
prediction = T.flatten(prediction)
return prediction
# Jacobian returns the first order partial derivatives of outputs w.r.t inputs
jac = theano.gradient.jacobian(pred_jac(x,weights_in_model),wrt=x)
compute_jac = theano.function([x,weights_in_model],[jac],allow_input_downcast=True)
Any suggestions on how to improve this function and/or speed up the function compile and computation times?
Related
After getting responded to this question, I realized that I have a different question.
I would like to have a different objective component based on the batch that I am passing during a training step. Suppose my batch size is one and I associate each training data with two supporter vectors that are not part of the training step. So I need to figure out which part of the input vector is currently being processed.
import numpy as np
import keras.backend as K
from keras.layers import Dense, Input
from keras.models import Model
features = np.random.rand(100, 5)
labels = np.random.rand(100, 2)
holder = np.random.rand(200, 5) # each feature gets two supporter.
iter = np.arange(start=1, stop=features.shape[0], step=1)
supporters = {}
for i,j in zip(iter, holder): #(i, i+1) represent the ith training data
supporters[i]=j
For instance, the first two rows of supporters is for the first point in feature.
features[0] [0.71444629 0.77256729 0.95375736 0.18759234 0.8207317 ]
has the following two supporters.
1: array([0.76281692, 0.18698215, 0.11687052, 0.78084761, 0.10293403]),
2: array([0.98229912, 0.08784577, 0.08109571, 0.23665783, 0.52587238])
Now, I create a simple model.
# Simple neural net with three outputs
input_layer = Input((5,))
hidden_layer = Dense(16)(input_layer)
output_layer = Dense(2)(hidden_layer)
# Model
model = Model(inputs=input_layer, outputs=output_layer)
My goal is to create a loss function as
def custom_loss(y_true, y_pred):
# Normal MSE loss
mse = K.mean(K.square(y_true-y_pred), axis=-1)
#Assume that I properly pass model object into the method use the predict method
#to use the current network weights
new_constraint = K.sum(y_pred - model.predict(supporters))
return(mse+new_constraint)
Then, I go ahead and compile my model.
model.compile(loss=custom_loss, optimizer='sgd')
model.fit(features, labels, epochs=1, ,batch_size=1)
The problem is that since the batch size is one, I want to make sure that the loss function only considers the supporter of the current training input. For example, if I am training the third point in features, then I want to use the fifth and sixth vectors while creating new_constraint. How can I accomplish this?
You can implement it like this (I have used the TensorFlow based Keras api but it shouldn't matter)
import numpy as np
import tensorflow as tf
from tensorflow.keras import Input, layers, Model
from tensorflow.keras import backend as K
features = np.random.rand(100, 5)
labels = np.random.rand(100, 2)
supporters = np.random.rand(200, 5) # each feature gets two supporter.
# I will get both support vectors to iterate over
supporters_1 = supporters[::2, :]
supporters_2 = supporters[1::2, :]
print(supporters_1.shape, supporters_2.shape)
# Result -> ((100, 5), (100, 5))
# Create a tf dataset to use in training
dataset = tf.data.Dataset.from_tensor_slices(((features, supporters_1, supporters_2), labels)).batch(1)
# A look at what it returns
for i in dataset:
print(i)
break
'''
Result:
((<tf.Tensor: shape=(1, 5), dtype=float64, numpy=array([[0.42834492, 0.01041871, 0.53058175, 0.69453215, 0.83901092]])>,
<tf.Tensor: shape=(1, 5), dtype=float64, numpy=array([[0.1724601 , 0.14386688, 0.49018201, 0.13565471, 0.35159235]])>,
<tf.Tensor: shape=(1, 5), dtype=float64, numpy=array([[0.87243349, 0.98779049, 0.98405784, 0.74069913, 0.25763667]])>),
<tf.Tensor: shape=(1, 2), dtype=float64, numpy=array([[0.20993531, 0.70153453]])>)
'''
#=========================================================
# Creating the model (Input size is 5 and not 2 in your sample so I changed it)
# Same for the label shape
input_layer = Input((5,))
hidden_layer = layers.Dense(16)(input_layer)
output_layer = layers.Dense(2)(hidden_layer)
# Model
model = Model(inputs=input_layer, outputs=output_layer)
#=========================================================
# Implementing the custom loss
# Without the `K.abs` the result can be negative and hence the `K.abs`
def custom_loss(y_true, y_pred, support_pred_1, support_pred_2):
mse = tf.keras.losses.mse(y_true, y_pred)
new_constraint = K.abs(K.sum(y_pred - [support_pred_1, support_pred_2]))
return (mse+new_constraint)
# Instantiate an optimizer.
optimizer = tf.keras.optimizers.SGD(learning_rate=1e-3)
'''
Now we create a custom training loop. In this we will get the logits
of all the inputs and then compute loss using the custom loss
function and then optimize on that loss.
'''
epochs = 10
for epoch in range(epochs):
print("Start of epoch %d" % (epoch,))
for step, ((features, support_1, support_2), labels) in enumerate(dataset):
with tf.GradientTape() as tape:
logits = model(features, training=True)
logits_1 = model(support_1, training=True)
logits_2 = model(support_2, training=True)
loss_value = custom_loss(labels, logits, logits_1, logits_2)
grads = tape.gradient(loss_value, model.trainable_weights)
optimizer.apply_gradients(zip(grads, model.trainable_weights))
print('loss_value: ', loss_value)
EDIT: There is another way to do this. As below:
# Everthing same till the supporters_1, supporters_2
def combine(inputs, targets):
features = inputs[0]
supports1 = inputs[1]
supports2 = inputs[2]
# Stack the inputs as a batch
final = tf.stack((features, support_1, support_2))
final = tf.reshape(final, (3,5))
return final, targets
# Creating the dataset
dataset = tf.data.Dataset.from_tensor_slices(((features, supporters_1, supporters_2), labels)).batch(1)
dataset = dataset.map(combine, num_parallel_calls=-1)
# Check the output
for i in dataset:
print(i)
break
'''
(<tf.Tensor: shape=(3, 5), dtype=float64, numpy=
array([[0.35641985, 0.93025517, 0.72874829, 0.81810538, 0.46682277],
[0.95497516, 0.71722253, 0.10608685, 0.37267656, 0.94748968],
[0.04822454, 0.00480376, 0.08479184, 0.51133809, 0.38242403]])>, <tf.Tensor: shape=(1, 2), dtype=float64, numpy=array([[0.21399956, 0.97149716]])>)
'''
#================MODEL=================
input_layer = Input((5,))
hidden_layer = layers.Dense(16)(input_layer)
output_layer = layers.Dense(2)(hidden_layer)
# Model
model = Model(inputs=input_layer, outputs=output_layer)
#=======================================
# change the loss function accordingly
'''
The first row in the y_pred will be the prediction corresponding to
actual features and the rest will be predictions corresponding to
supports and hence you can change the loss function as below.
'''
def custom_loss(y_true, y_pred):
mse = tf.keras.losses.mse(y_true, y_pred[0, :])
new_constraint = K.abs(K.sum(y_pred[0, :] - y_pred[1:, :]))
return (mse+new_constraint)
# Compile
model.compile(loss=custom_loss, optimizer='adam')
# train
model.fit(dataset, epochs=5)
I've looked at a few similar questions but I still don't understand how to solve my problem.
I am trying to build a CNN that estimates how many particles hit a detector, based on what's essentially an oscilloscope trace of the energy released in the detector over time.
I have 100,000 events of 1024 time samples, which I split 80/20 as train/test, like so:
from sklearn.model_selection import train_test_split
train_to_test_ratio=0.8 #proportion of the dataset to include in the train split
X_train,X_test,Y_train,Y_test=train_test_split(NormSignals,labels,train_size=train_to_test_ratio)
no_outputs = 14 # maximum number of particles expected
# force the labels to have 14 binary digits, one for each of the possible outputs
Y_train=tf.one_hot(Y_train,no_outputs)
Y_test=tf.one_hot(Y_test,no_outputs)
When I try to define the input shape for the network I do so like this (full CNN code below):
# Define input to neural network (tensors of 1024 time samples x 1 amplitude per sample)
inputs = keras.Input(shape=(1024,1))
But it gives me the error: "Input 0 of layer Conv_1 is incompatible with the layer: expected ndim=4, found ndim=3. Full shape received: [None, 1024, 1]"
I thought the input shape was as simple as the shape of the data arrays being passed to the network. Can someone please explain what the correct shape of my data should be?
Thank you very much in advance!
Full CNN:
from tensorflow import keras
# Following the architecture of the CNN from the image recognition lab (14/5/2020):
# Simple CNN:
class noiseLayer(keras.layers.Layer):
def __init__(self,mean):
super(noiseLayer, self).__init__()
self.mean = mean
def call(self, input):
mean = self.mean
return input + (np.random.poisson(mean))/mean
# Add data augmentation to produce a random flip of the data (the ECal is symmetrical)
# and add poissonian noise to all of the crystals - using large N and dividing by N normalises
# the noise to be approximately continuous between 0 and 1
data_augmentation = keras.Sequential([
noiseLayer(mean = 1000)
], name='DataAugm')
# Define input to neural network (tensors of 1024 time samples x 1 amplitude per sample)
inputs = keras.Input(shape=(1024,1))
#x=inputs
x = data_augmentation(inputs)
# primo blocco Convoluzionale
x = keras.layers.Conv2D(16, kernel_size=(3,3), name='Conv_1')(x)
x = keras.layers.LeakyReLU(0.1)(x)
x = keras.layers.MaxPool2D((2,2), name='MaxPool_1')(x)
# secondo blocco Convoluzionale
x = keras.layers.Conv2D(16, kernel_size=(3,3), name='Conv_2')(x)
x = keras.layers.LeakyReLU(0.1)(x)
x = keras.layers.MaxPool2D((2,2), name='MaxPool_2')(x)
# terzo blocco convoluzionale
x = keras.layers.Conv2D(32, kernel_size=(3,3), name='Conv_3')(x)
x = keras.layers.LeakyReLU(0.1)(x)
x = keras.layers.MaxPool2D((2,2), name='MaxPool_3')(x)
# Flatten output tensor of the last convolutional layer so it can be used as
# input to the dense layers
x = keras.layers.Flatten(name='Flatten')(x)
# dense network: 2 dense hidden layer with 256 neurons, with ReLU activation
# Classifier
x = keras.layers.Dense(64, name='Dense_1')(x)
x = keras.layers.ReLU(name='ReLU_dense_1')(x)
#x = keras.layers.Dropout(0.2)(x)
x = keras.layers.Dense(64, name='Dense_2')(x)
x = keras.layers.ReLU(name='ReLU_dense_2')(x)
outputs = keras.layers.Dense(no_outputs, activation='softmax', name='Output')(x)
# Model definition
model = keras.Model(inputs=inputs, outputs=outputs, name='VGGlike_CNN')
# Print model summary
model.summary()
# Show model structure
keras.utils.plot_model(model, show_shapes=True)
The problem was that I was using 2D layers to try to solve a 1D problem.
Changing all the 2D layers to 1D now compiles without errors:
x = keras.layers.Conv1D(16, kernel_size=(3), name='Conv_1')(x)
x = keras.layers.LeakyReLU(0.1)(x)
x = keras.layers.MaxPool1D((2), name='MaxPool_1')(x)
# secondo blocco Convoluzionale
x = keras.layers.Conv1D(16, kernel_size=(3), name='Conv_2')(x)
x = keras.layers.LeakyReLU(0.1)(x)
x = keras.layers.MaxPool1D((2), name='MaxPool_2')(x)
# terzo blocco convoluzionale
x = keras.layers.Conv1D(32, kernel_size=(3), name='Conv_3')(x)
x = keras.layers.LeakyReLU(0.1)(x)
x = keras.layers.MaxPool1D((2), name='MaxPool_3')(x)
# Flatten output tensor of the last convolutional layer so it can be used as
# input to the dense layers
x = keras.layers.Flatten(name='Flatten')(x)
# dense network: 2 dense hidden layer with 256 neurons, with ReLU activation
# Classifier
x = keras.layers.Dense(64, name='Dense_1')(x)
x = keras.layers.ReLU(name='ReLU_dense_1')(x)
#x = keras.layers.Dropout(0.2)(x)
x = keras.layers.Dense(64, name='Dense_2')(x)
x = keras.layers.ReLU(name='ReLU_dense_2')(x)
I am trying to create a model with Normalized cross correlation custom layer, code taken from here
from keras import backend as K
from keras.layers import Conv2D, MaxPooling2D, Dense, Input, Flatten
from keras.models import Model, Sequential
from keras.engine import InputSpec, Layer
from keras import regularizers
from keras.optimizers import SGD, Adam
from keras.utils.conv_utils import conv_output_length
from keras import activations
import numpy as np
class Normalized_Correlation_Layer(Layer):
# create a class inherited from keras.engine.Layer.
def __init__(self, patch_size=(5, 5),
dim_ordering='tf',
border_mode='same',
stride=(1, 1),
activation=None,
**kwargs):
if border_mode != 'same':
raise ValueError('Invalid border mode for Correlation Layer '
'(only "same" is supported as of now):', border_mode)
self.kernel_size = patch_size
self.subsample = stride
self.dim_ordering = dim_ordering
self.border_mode = border_mode
self.activation = activations.get(activation)
super(Normalized_Correlation_Layer, self).__init__(**kwargs)
def compute_output_shape(self, input_shape):
return(input_shape[0][0], input_shape[0][1], input_shape[0][2], self.kernel_size[0] * input_shape[0][2]*input_shape[0][-1])
def get_config(self):
config = {'patch_size': self.kernel_size,
'activation': self.activation.__name__,
'border_mode': self.border_mode,
'stride': self.subsample,
'dim_ordering': self.dim_ordering}
base_config = super(Correlation_Layer, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
def call(self, x, mask=None):
input_1, input_2 = x
stride_row, stride_col = self.subsample
inp_shape = input_1._keras_shape
output_shape = self.compute_output_shape([inp_shape, inp_shape])
padding_row = (int(self.kernel_size[0] / 2),int(self.kernel_size[0] / 2))
padding_col = (int(self.kernel_size[1] / 2),int(self.kernel_size[1] / 2))
input_1 = K.spatial_2d_padding(input_1, padding =(padding_row,padding_col))
input_2 = K.spatial_2d_padding(input_2, padding = ((padding_row[0]*2, padding_row[1]*2),padding_col))
output_row = output_shape[1]
output_col = output_shape[2]
output = []
for k in range(inp_shape[-1]):
xc_1 = []
xc_2 = []
# print("here")
for i in range(padding_row[0]):
for j in range(output_col):
xc_2.append(K.reshape(input_2[:, i:i+self.kernel_size[0], j:j+self.kernel_size[1], k],
(-1, 1,self.kernel_size[0]*self.kernel_size[1])))
for i in range(output_row):
slice_row = slice(i, i + self.kernel_size[0])
slice_row2 = slice(i + padding_row[0], i +self.kernel_size[0] + padding_row[0])
# print("dfg")
for j in range(output_col):
slice_col = slice(j, j + self.kernel_size[1])
xc_2.append(K.reshape(input_2[:, slice_row2, slice_col, k],
(-1, 1,self.kernel_size[0]*self.kernel_size[1])))
xc_1.append(K.reshape(input_1[:, slice_row, slice_col, k],
(-1, 1,self.kernel_size[0]*self.kernel_size[1])))
for i in range(output_row, output_row+padding_row[1]):
for j in range(output_col):
xc_2.append(K.reshape(input_2[:, i:i+ self.kernel_size[0], j:j+self.kernel_size[1], k],
(-1, 1,self.kernel_size[0]*self.kernel_size[1])))
xc_1_aggregate = K.concatenate(xc_1, axis=1)
xc_1_mean = K.mean(xc_1_aggregate, axis=-1, keepdims=True)
xc_1_std = K.std(xc_1_aggregate, axis=-1, keepdims=True)
xc_1_aggregate = (xc_1_aggregate - xc_1_mean) / xc_1_std
xc_2_aggregate = K.concatenate(xc_2, axis=1)
xc_2_mean = K.mean(xc_2_aggregate, axis=-1, keepdims=True)
xc_2_std = K.std(xc_2_aggregate, axis=-1, keepdims=True)
xc_2_aggregate = (xc_2_aggregate - xc_2_mean) / xc_2_std
xc_1_aggregate = K.permute_dimensions(xc_1_aggregate, (0, 2, 1))
block = []
len_xc_1= len(xc_1)
print("asdf")
for i in range(len_xc_1):
#This for loop is to compute the product of a given patch of feature map 1 and the feature maps on which it is supposed to
sl1 = slice(int(i/inp_shape[2])*inp_shape[2],
int(i/inp_shape[2])*inp_shape[2]+inp_shape[2]*self.kernel_size[0])
#This calculates which are the patches of feature map 2 to be considered for a given patch of first feature map.
block.append(K.reshape(K.batch_dot(xc_2_aggregate[:,sl1,:],
xc_1_aggregate[:,:,i]),(-1,1,1,inp_shape[2] *self.kernel_size[0])))
block = K.concatenate(block, axis=1)
# print("zxcv")
block= K.reshape(block,(-1,output_row,output_col,inp_shape[2] *self.kernel_size[0]))
output.append(block)
output = self.activation(output)
print(output)
return output
My model is a combination of cross correlation and Conv2D layers,
dt = 'float32'
def create_model():
ip = keras.layers.Input((50,50, 1))
ncx1_1 = Normalized_Correlation_Layer(patch_size=(1, 1))([ip,ip])
ncn1_1 = keras.layers.Conv2D(64, (1,1), activation = 'relu', dtype=dt)(ip)
ncn2_1 = keras.layers.Conv2D(64, (1,1), activation = 'relu', dtype=dt)(ncx1_1)
ncx2_1 = Normalized_Correlation_Layer(patch_size=(1, 1),dtype=dt)([ncn1_1,ncn2_1])
# ncx2_1 = keras.layers.Reshape((50, 50, 3200))(ncx2_1)
# Problem occurs here
ncn3 = keras.layers.Conv2D(filters=64,kernel_size=(1,1), activation = 'relu', dtype=dt)(ncx2_1)
ncn4 = keras.layers.Conv2D(12, (1,1), activation = 'sigmoid', dtype=dt)(ncn3)
model = keras.models.Model(ip,ncn4)
return model
The model till the last cross correlation layer is successfully created, but I get problem for ncn3 layer
ValueError: number of input channels does not match corresponding dimension of filter, 50 != 3200
The output shape printed from the ncx2_1 layer, while creating it is printed as (?, 50, 50, 50),
when I print ncx2_1.shape and also the outputs returned from call function of layer class ([<tf.Tensor 'normalized__correlation__layer_4/Reshape_10000:0' shape=(?, 50, 50, 50) dtype=float32>]).
But the model summary shows it as (?,50,50,3200) when I create the model till that layer only, ie. model = keras.models.Model(ip,ncx2_1)
When I reshape the layer using ncx2_1 = keras.layers.Reshape((50, 50, 3200))(ncx2_1) , I can create the model successfully, but when I try to fit the data on it, I get :
InvalidArgumentError: Input to reshape is a tensor with 6250000 values, but the requested shape has 400000000
[[node reshape_1/Reshape (defined at /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1781) ]]
[[node loss/mul (defined at /usr/local/lib/python3.6/dist-packages/keras/engine/training.py:865) ]]
Here, my batch size is 50, so for a layer with (B,H,W,C) inputs of (50,50,50,50), the size should be 6250000, butt for (50,50,50,3200), it should be 400000000, which means that the output of cross correlation layer is 50 channels.
I am either interpreting this wrong or I have made a mistake somewhere which I would like to know about.
I am using keras 2.1.2 with tensorflow 1.13.1 (That was the version in which the custom layer was written and I was getting other problems with latest version)
I am also using a custom generator if that is needed info and calling fit using md.fit_generator(train_gen,verbose=1). I can also add any other detail necessary.
Was trying to implement a ResNet- CIFAR 10 model on Google Colab, using the code from https://github.com/jzuern/cifar-classifier.
Instead of ReLU activation I'm using my own custom activation function. Here is the code:
def fonlaaf(x):
return x/(1-tf.exp(-x))
def resnet_layer(inputs,
num_filters=16,
kernel_size=3,
strides=1, activation='fonlaaf',
batch_normalization=True,
conv_first=True):
"""2D Convolution-Batch Normalization-Activation stack builder
# Arguments
inputs (tensor): input tensor from input image or previous layer
num_filters (int): Conv2D number of filters
kernel_size (int): Conv2D square kernel dimensions
strides (int): Conv2D square stride dimensions
activation (string): activation name
batch_normalization (bool): whether to include batch normalization
conv_first (bool): conv-bn-activation (True) or
bn-activation-conv (False)
# Returns
x (tensor): tensor as input to the next layer
"""
conv = Conv2D(num_filters,
kernel_size=kernel_size,
strides=strides,
padding='same',
kernel_initializer='he_normal',
kernel_regularizer=tf.keras.regularizers.l2(1e-4))
x = inputs
if conv_first:
x = conv(x)
if batch_normalization:
x = BatchNormalization()(x)
if activation is not None:
x = fonlaaf(x)
else:
if batch_normalization:
x = BatchNormalization()(x)
if activation is not None:
x = fonlaaf(x)
x = conv(x)
return x
def resnet_v2(input_shape, depth=20, num_classes=10):
"""ResNet Version 2 Model builder [b]
Stacks of (1 x 1)-(3 x 3)-(1 x 1) BN-ReLU-Conv2D or also known as
bottleneck layer
First shortcut connection per layer is 1 x 1 Conv2D.
Second and onwards shortcut connection is identity.
At the beginning of each stage, the feature map size is halved (downsampled)
by a convolutional layer with strides=2, while the number of filter maps is
doubled. Within each stage, the layers have the same number filters and the
same filter map sizes.
Features maps sizes:
conv1 : 32x32, 16
stage 0: 32x32, 64
stage 1: 16x16, 128
stage 2: 8x8, 256
# Arguments
input_shape (tensor): shape of input image tensor
depth (int): number of core convolutional layers
num_classes (int): number of classes (CIFAR10 has 10)
# Returns
model (Model): Keras model instance
"""
if (depth - 2) % 9 != 0:
raise ValueError('depth should be 9n+2 (eg 56 or 110 in [b])')
# Start model definition.
num_filters_in = 16
num_res_blocks = int((depth - 2) / 9)
inputs = Input(shape=input_shape)
# v2 performs Conv2D with BN-ReLU on input before splitting into 2 paths
x = resnet_layer(inputs=inputs,
num_filters=num_filters_in,
conv_first=True)
# Instantiate the stack of residual units
for stage in range(3):
for res_block in range(num_res_blocks):
activation = 'relu'
batch_normalization = True
strides = 1
if stage == 0:
num_filters_out = num_filters_in * 4
if res_block == 0: # first layer and first stage
activation = None
batch_normalization = False
else:
num_filters_out = num_filters_in * 2
if res_block == 0: # first layer but not first stage
strides = 2 # downsample
# bottleneck residual unit
y = resnet_layer(inputs=x,
num_filters=num_filters_in,
kernel_size=1,
strides=strides,
activation=activation,
batch_normalization=batch_normalization,
conv_first=False)
y = resnet_layer(inputs=y,
num_filters=num_filters_in,
conv_first=False)
y = resnet_layer(inputs=y,
num_filters=num_filters_out,
kernel_size=1,
conv_first=False)
if res_block == 0:
# linear projection residual shortcut connection to match
# changed dims
x = resnet_layer(inputs=x,
num_filters=num_filters_out,
kernel_size=1,
strides=strides,
activation=None,
batch_normalization=False)
x = tf.keras.layers.add([x, y])
num_filters_in = num_filters_out
# Add classifier on top.
# v2 has BN-ReLU before Pooling
x = BatchNormalization()(x)
x = fonlaaf(x)
x = AveragePooling2D(pool_size=8)(x)
y = Flatten()(x)
outputs = Dense(num_classes,
activation='softmax',
kernel_initializer='he_normal')(y)
# Instantiate model.
model = Model(inputs=inputs, outputs=outputs)
model.compile(loss='sparse_categorical_crossentropy',
optimizer=tf.keras.optimizers.Adam(lr=hparams.learning_rate),
metrics=['accuracy'])
return model
tf.logging.set_verbosity(tf.logging.DEBUG)
resnet_model = resnet_v2((32, 32, 3), depth=56, num_classes=hparams.n_classes)
# Download and extract CIFAR-10 data
maybe_download_and_extract()
# training data
x_train, y_train = load_training_data()
# Validation data
x_val, y_val = load_validation_data()
# Testing data
x_test, y_test = load_testing_data()
# Define callbacks
callbacks = [
tf.keras.callbacks.TensorBoard(log_dir=hparams.checkpoint_dir)
]
# This will do preprocessing and realtime data augmentation:
datagen = ImageDataGenerator(
featurewise_center=False, # set input mean to 0 over the dataset
samplewise_center=False, # set each sample mean to 0
featurewise_std_normalization=False, # divide inputs by std of the dataset
samplewise_std_normalization=False, # divide each input by its std
zca_whitening=False, # apply ZCA whitening
zca_epsilon=1e-06, # epsilon for ZCA whitening
rotation_range=0, # randomly rotate images in the range (degrees, 0 to 180)
# randomly shift images horizontally (fraction of total width)
width_shift_range=0.1,
# randomly shift images vertically (fraction of total height)
height_shift_range=0.1,
# set mode for filling points outside the input boundaries
fill_mode='nearest',
cval=0., # value used for fill_mode = "constant"
horizontal_flip=True, # randomly flip images
vertical_flip=False)
# Compute quantities required for feature-wise normalization
# (std, mean, and principal components if ZCA whitening is applied).
datagen.fit(x_train)
# Fit the model on the batches generated by datagen.flow().
resnet_model.fit_generator(
datagen.flow(x_train, y_train,batch_size=hparams.train_batch_size),
epochs=hparams.n_epochs,
validation_data=(x_val, y_val),
workers=4,
callbacks=callbacks)
Got the following error: ValueError: Output tensors to a Model must be the output of a TensorFlow Layer (thus holding past layer metadata). Found: Tensor("dense/Softmax:0", shape=(?, 10), dtype=float32)
The previous answers mostly to this error didn't work out. What am I missing here?
As the error states, you have to pass the output of a Layer. As fonlaaf() is an activation function with no state, you can use Lambda layer.
Replace,
def fonlaaf(x):
return x/(1-tf.exp(-x))
with
def fonlaaf(x):
return tf.keras.layers.Lambda(lambda x: x/(1-tf.exp(-x)))(x)
https://www.tensorflow.org/guide/keras/#custom_layers
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Lambda
I am implementing an OCR with Keras, Tensorflow backend.
I want to use keras.backend.ctc_decode implementation.
I have a model class :
import keras
def ctc_lambda_func(args):
y_pred, y_true, input_x_width, input_y_width = args
# the 2 is critical here since the first couple outputs of the RNN
# tend to be garbage:
# y_pred = y_pred[:, 2:, :]
return keras.backend.ctc_batch_cost(y_true, y_pred, input_x_width, input_y_width)
class ModelOcropy(keras.Model):
def __init__(self, alphabet: str):
self.img_height = 48
self.lstm_size = 100
self.alphabet_size = len(alphabet)
# check backend input shape (channel first/last)
if keras.backend.image_data_format() == "channels_first":
input_shape = (1, None, self.img_height)
else:
input_shape = (None, self.img_height, 1)
# data input
input_x = keras.layers.Input(input_shape, name='x')
# training inputs
input_y = keras.layers.Input((None,), name='y')
input_x_widths = keras.layers.Input([1], name='x_widths')
input_y_widths = keras.layers.Input([1], name='y_widths')
# network
flattened_input_x = keras.layers.Reshape((-1, self.img_height))(input_x)
bidirectional_lstm = keras.layers.Bidirectional(
keras.layers.LSTM(self.lstm_size, return_sequences=True, name='lstm'),
name='bidirectional_lstm'
)(flattened_input_x)
dense = keras.layers.Dense(self.alphabet_size, activation='relu')(bidirectional_lstm)
y_pred = keras.layers.Softmax(name='y_pred')(dense)
# ctc loss
ctc = keras.layers.Lambda(ctc_lambda_func, output_shape=[1], name='ctc')(
[dense, input_y, input_x_widths, input_y_widths]
)
# init keras model
super().__init__(inputs=[input_x, input_x_widths, input_y, input_y_widths], outputs=[y_pred, ctc])
# ctc decoder
top_k_decoded, _ = keras.backend.ctc_decode(y_pred, input_x_widths)
self.decoder = keras.backend.function([input_x, input_x_widths], [top_k_decoded[0]])
# decoded_sequences = self.decoder([test_input_data, test_input_lengths])
My use of ctc_decode comes from another post : Keras using Lambda layers error with K.ctc_decode
I get an error :
ValueError: Shape must be rank 1 but is rank 2 for 'CTCGreedyDecoder' (op: 'CTCGreedyDecoder') with input shapes: [?,?,7], [?,1].
I guess I have to squeeze my input_x_widths, but Keras does not seem to have such function (it always outputs something like (batch_size, 1))
Indeed, the function is expecting a 1D tensor, and you've got a 2D tensor.
Keras does have the keras.backend.squeeze(x, axis=-1) function.
And you can also use keras.backend.reshape(x, (-1,))
If you need to go back to the old shape after the operation, you can both:
keras.backend.expand_dims(x)
keras.backend.reshape(x,(-1,1))
Complete fix :
# ctc decoder
flattened_input_x_width = keras.backend.reshape(input_x_widths, (-1,))
top_k_decoded, _ = keras.backend.ctc_decode(y_pred, flattened_input_x_width)
self.decoder = keras.backend.function([input_x, flattened_input_x_width], [top_k_decoded[0]])
# decoded_sequences = self.decoder([input_x, flattened_input_x_width])