I have a trained LSTM model with 1 LSTM Layer and 3 Dense layers. I am using it for a sequence to One prediction. I have 4 input variables and 1 output variable. I am using the values of the last 20 timesteps to predict the next value of my output variable. The architecture of the model is shown below
model = Sequential()
model.add(LSTM(units = 120, activation ='relu', return_sequences = False,input_shape =
(train_in.shape[1],5)))
model.add(Dense(100,activation='relu'))
model.add(Dense(50,activation='relu'))
model.add(Dense(1))
The shapes of training input and training output are as shown below
train_in.shape , train_out.shape
((89264, 20, 5), (89264,))
I want to calculate the jacobian matrix for this model.
Say, Y = f(x1,x2,x3,x4) is the representation of the above neural network where:
Y -- Output variable of the trained model, f -- Is the function representing the Model; x1,x2,x3,x4 --input parameters.
How can I calculate the Jacobian Matrix?? Please share your thoughts on this. Also any valuable references if you know any.
Thank you :)
you might want to take a look at tf.GradientTape in tensorflow. Gradient tape is very simple way to auto-differentiate your computation. And the link has some basic example.
However your model is already quite big. If you have n parameters, your jacobian will have n*n values. I believe your model probably already has more than 10000 parameters. You might need to make it smaller.
I found a way to get the Jacobian matrix for LSTM model output with respect to the input. I am posting it here so that it might help someone in the future. Please share if there is any better or more simple way to do the same
import numpy as np
import pandas as pd
import tensorflow as tf
tf.compat.v1.enable_eager_execution() #This will enable eager execution which is must.
tf.executing_eagerly() #check if eager execution is enabled or not. Should give "True"
data = pd.read_excel("FileName or Location ")
#My data is in the from of dataframe with 127549 rows and 5 columns(127549*5)
a = data[:20] #shape is (20,5)
b = data[50:70] # shape is (20,5)
A = [a,b] # making a list
A = np.array(A) # convert into array size (2,20,5)
At = tf.convert_to_tensor(A, np.float32) #convert into tensor
At.shape # TensorShape([Dimension(2), Dimension(20), Dimension(5)])
model = load_model('EKF-LSTM-1.h5') # Load the trained model
# I have a trained model which is shown in the question above.
# Output of this model is a single value
with tf.GradientTape(persistent=True,watch_accessed_variables=True) as tape:
tape.watch(At)
y1 = model(At) #defining your output as a function of input variables
print(y1,type(y1)
#output
tf.Tensor([[0.04251503],[0.04634088]], shape=(2, 1), dtype=float32) <class
'tensorflow.python.framework.ops.EagerTensor'>
jacobian=tape.jacobian(y1,At) #jacobian of output w.r.t both inputs
jacobian.shape
Outupt
TensorShape([Dimension(2), Dimension(1), Dimension(2), Dimension(20), Dimension(5)])
Here I calculated Jacobian w.r.t 2 inputs each of size (20,5). If you want to calculate w.r.t to only one input of size (20,5), then use this
jacobian=tape.jacobian(y1,At[0]) #jacobian of output w.r.t only 1st input in 'At'
jacobian.shape
Output
TensorShape([Dimension(1), Dimension(1), Dimension(1), Dimension(20), Dimension(5)])
Related
I am interested in using a neural network to estimate the parameters of a linear regression. To do this I am creating a network that makes two-parameter prediction, and I am trying to write a custom loss function that will determine how well the two parameters do as a slope and intercept in a logistic regression model, using a third dataset as a predictor in the logistic regression.
So I have a matrix of predictors X, with dimensions 10,000 by 20, and a binary outcome variable y. Additionally, I have a 10,000 observations linear_predictor that I want to use to use in the custom loss function evaluate the two outputs of the model.
import numpy as np
from tensorflow.keras import Model, Input
from tensorflow.keras import Model, Input
from tensorflow.keras.layers import Dense
import tensorflow as tf
# create some dummy data
X = np.random.rand(10_000, 20)
y = (np.random.rand(10_000) > 0.8).astype(int)
linear_predictor = np.random.rand(10_000)
# define custom loss function
def CustomLoss(y_true, y_pred, input_):
y_estim = y_pred[:,0]*input_ + y_pred[:,1]
y_estim = tf.gather(y_pred, 0, axis=1)*input_ + tf.gather(y_pred, 1, axis=1)
return tf.keras.losses.BinaryCrossentropy(from_logits=True)(y_true, y_estim)
# create inputs to model
lp_input = Input(shape=linear_predictor.shape)
X_input = Input(shape=X.shape)
y_input = Input(shape=y.shape)
# create network
hidden1 = Dense(32, activation='relu')(X_input)
hidden2 = Dense(8, activation='relu')(hidden1)
output = Dense(2, activation='linear')(hidden2)
model = Model([y_input, X_input, lp_input], output)
# add loss function
model.add_loss(CustomLoss(y_input, output, lp_input))
# fit model
model.fit(x=X_input, y=y_input, epochs=3)
However, I am unable to get the CustomLoss function to work. Something is going wrong with subsetting the model's two-parameter output to get one parameter to use as a scalar as the slope and another to use as the intercept.
The error I am getting is:
ValueError: Exception encountered when calling layer "tf.math.multiply_1" (type TFOpLambda).
Dimensions must be equal, but are 2 and 10000 for '{{node tf.math.multiply_1/Mul}} = Mul[T=DT_FLOAT](
Placeholder, Placeholder_1)' with input shapes: [?,2], [?,10000].
Call arguments received by layer "tf.math.multiply_1" (type TFOpLambda):
• x=tf.Tensor(shape=(None, 2), dtype=float32)
• y=tf.Tensor(shape=(None, 10000), dtype=float32)
• name=None
This suggests that the variable y_pred is not being subset, even though I have tried using the method recommended here with numpy-like indexing (y_pred[:1]) as well as the gather_nd method here, among others.
I think this should be possible, any help is appreciated.
I am building a sequence to one model prediction using LSTM. My data has 4 input variables and 1 output variable which needs to be predicted. The data is a time series data. The total length of the data is 38265 (total number of timesteps). The total data is in a Data Frame of size 38265 *5
I want to use the previous 20 timesteps data of the 4 input variables to make prediction of my output variable. I am using the below code for this purpose.
model = Sequential()
model.add(LSTM(units = 120, activation ='relu', return_sequences = False,input_shape =
(train_in.shape[1],5)))
model.add(Dense(100,activation='relu'))
model.add(Dense(50,activation='relu'))
model.add(Dense(1))
I want to calculate the Jacobian of the output variable w.r.t the LSTM model function using tf.Gradient Tape .. Can anyone help me out with this??
The solution to segregate the Jacobian of the output with respect to the LSTM input can be done as follows:
Using tf.GradientTape(), we can compute the Jacobian arising from the gradient flow.
However for getting the Jacobian , the input needs to be in the form of tf.EagerTensor which is usually available when we want to see the Jacobian of the output (after executing y=model(x)). The following code snippet shares this idea:
#Get the Jacobian for each persistent gradient evaluation
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(2,activation='relu'))
model.add(tf.keras.layers.Dense(2,activation='relu'))
x = tf.constant([[5., 6., 3.]])
with tf.GradientTape(persistent=True,watch_accessed_variables=True) as tape:
# Forward pass
tape.watch(x)
y = model(x)
loss = tf.reduce_mean(y**2)
print('Gradients\n')
jacobian_wrt_loss=tape.jacobian(loss,x)
print(f'{jacobian_wrt_loss}\n')
jacobian_wrt_y=tape.jacobian(y,x)
print(f'{jacobian_wrt_y}\n')
But for getting intermediate outputs ,such as in this case, there have been many samples which use Keras. When we separate the outputs coming out from model.layers.output, we get the type to be a Keras.Tensor instead of an EagerTensor.
However for creating the Jacobian, we need the Eager Tensor. (After many failed attempts with #tf.function wrapping as eager execution is already present in TF>2.0)
So alternatively, an auxiliary model can be created with the layers required (in this case, just the Input and LSTM layers).The output of this model will be a tf.EagerTensor which will be useful for the Jacobian tensor creation. The following has been shown in this snippet:
#General Syntax for getting jacobians for each layer output
import numpy as np
import tensorflow as tf
tf.executing_eagerly()
x=tf.constant([[15., 60., 32.]])
x_inp = tf.keras.layers.Input(tensor=tf.constant([[15., 60., 32.]]))
model=tf.keras.Sequential()
model.add(tf.keras.layers.Dense(2,activation='relu',name='dense_1'))
model.add(tf.keras.layers.Dense(2,activation='relu',name='dense_2'))
aux_model=tf.keras.Sequential()
aux_model.add(tf.keras.layers.Dense(2,activation='relu',name='dense_1'))
#model.compile(loss='sparse_categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
with tf.GradientTape(persistent=True,watch_accessed_variables=True) as tape:
# Forward pass
tape.watch(x)
x_y = model(x)
act_y=aux_model(x)
print(x_y,type(x_y))
ops=[layer.output for layer in model.layers]
# ops=[layer.output for layer in model.layers]
# inps=[layer.input for layer in model.layers]
print('Jacobian of Full FFNN\n')
jacobian=tape.jacobian(x_y,x)
print(f'{jacobian[0]}\n')
print('Jacobian of FFNN with Just first Dense\n')
jacobian=tape.jacobian(act_y,x)
print(f'{jacobian[0]}\n')
Here I have used a simple FFNN consisting of 2 Dense layers, but I want to evaluate w.r.t the output of the first Dense layer. Hence I created an auxiliary model having just 1 Dense layer and determined the output of the Jacobian from it.
The details can be found here.
With the help from #Abhilash Majumder, I have done it this way. I am posting it here so that it might help someone in the future.
import numpy as np
import pandas as pd
import tensorflow as tf
tf.compat.v1.enable_eager_execution() #This will enable eager execution which is must.
tf.executing_eagerly() #check if eager execution is enabled or not. Should give "True"
data = pd.read_excel("FileName or Location ")
#My data is in the from of dataframe with 127549 rows and 5 columns(127549*5)
a = data[:20] #shape is (20,5)
b = data[50:70] # shape is (20,5)
A = [a,b] # making a list
A = np.array(A) # convert into array size (2,20,5)
At = tf.convert_to_tensor(A, np.float32) #convert into tensor
At.shape # TensorShape([Dimension(2), Dimension(20), Dimension(5)])
model = load_model('EKF-LSTM-1.h5') # Load the trained model
# I have a trained model which is shown in the question above.
# Output of this model is a single value
with tf.GradientTape(persistent=True,watch_accessed_variables=True) as tape:
tape.watch(At)
y1 = model(At) #defining your output as a function of input variables
print(y1,type(y1)
#output
tf.Tensor([[0.04251503],[0.04634088]], shape=(2, 1), dtype=float32) <class
'tensorflow.python.framework.ops.EagerTensor'>
jacobian=tape.jacobian(y1,At) #jacobian of output w.r.t both inputs
jacobian.shape
Outupt
TensorShape([Dimension(2), Dimension(1), Dimension(2), Dimension(20), Dimension(5)])
Here I calculated Jacobian w.r.t 2 inputs each of size (20,5). If you want to calculate w.r.t to only one input of size (20,5), then use this
jacobian=tape.jacobian(y1,At[0]) #jacobian of output w.r.t only 1st input in 'At'
jacobian.shape
Output
TensorShape([Dimension(1), Dimension(1), Dimension(1), Dimension(20), Dimension(5)])
For those looking to compute the Jacobian over a series of inputs and outputs that are independent of each other for input[i], output[j], i != j, consider the batch_jacobian method.
This will reduce the number of dimensions in your computed Jacobian tensor by one and could be the difference between running out of memory and not.
See: batch_jacobian in the TensorFlow GradientTape docs.
I wish to exam the values of a tensor after mask is applied to it.
Here is a truncated part of the model. I let temp = x so later I wish to print temp to check the exact values.
So given a 4-class classification model using acoustic features. Assume I have data in (1000,50,136) as (batch, timesteps, features)
The objective is to check if the model is studying the features by timesteps. In other words, we wish to reassure the model is learning using slice as the red rectangle in the picture. Logically, it is the way for Keras LSTM layer but the confusion matrix produced is quite different when a parameter changes (eg. Dense units). The validation accuracy stays 45% thus we would like to visualize the model.
The proposed idea is to print out the first step of the first batch and print out the input in the model. If they are the same, then model is learning in the right way ((136,1) features once) instead of (50,1) timesteps of a single feature once.
input_feature = Input(shape=(X_train.shape[1],X_train.shape[2]))
x = Masking(mask_value=0)(input_feature)
temp = x
x = Dense(Dense_unit,kernel_regularizer=l2(dense_reg), activation='relu')(x)
I have tried tf.print() which brought me AttributeError: 'Tensor' object has no attribute '_datatype_enum'
As Get output from a non final keras model layer suggested by Lescurel.
model2 = Model(inputs=[input_attention, input_feature], outputs=model.get_layer('masking')).output
print(model2.predict(X_test))
AttributeError: 'Masking' object has no attribute 'op'
You want to output after mask.
lescurel's link in the comment shows how to do that.
This link to github, too.
You need to make a new model that
takes as inputs the input from your model
takes as outputs the output from the layer
I tested it with some made-up code derived from your snippets.
import numpy as np
from keras import Input
from keras.layers import Masking, Dense
from keras.regularizers import l2
from keras.models import Sequential, Model
X_train = np.random.rand(4,3,2)
Dense_unit = 1
dense_reg = 0.01
mdl = Sequential()
mdl.add(Input(shape=(X_train.shape[1],X_train.shape[2]),name='input_feature'))
mdl.add(Masking(mask_value=0,name='masking'))
mdl.add(Dense(Dense_unit,kernel_regularizer=l2(dense_reg),activation='relu',name='output_feature'))
mdl.summary()
mdl2mask = Model(inputs=mdl.input,outputs=mdl.get_layer("masking").output)
maskoutput = mdl2mask.predict(X_train)
mdloutput = mdl.predict(X_train)
maskoutput # print output after/of masking
mdloutput # print output of mdl
maskoutput.shape #(4, 3, 2): masking has the shape of the layer before (input here)
mdloutput.shape #(4, 3, 1): shape of the output of dense
New to neural nets so please correct my syntax.
I'm trying to create a LSTM RNN that will predict the Fibonacci sequence. When I ran the code below, the loss remains incredibly high (around 35339663592701874176).
Why does the shape of the input have to be (batch_size, timesteps, input_dim)? In my example I have 100 data entries so that'd be my batch_size, and the Fibonacci sequence takes in 2 inputs so that'd be input_dim but what would timesteps be in this case? 1?
Shouldn't the the units of the LSTM be 1? If I'm understanding correctly, the "units" are just the amount of hidden state nodes that are in the LSTM. So in theory, each of the 2 inputs would have a "1" coefficient weight towards that hidden state after training.
Would an RNN be a suitable model for this problem? When I've looked online, most people like to use the Fibonacci sequence as an example to explain how RNN's work.
Thanks for the help!
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
# Create Training Data
xs = [[[1, 1]]]
ys = []
i = 0
while i < 100:
ys.append([xs[i][0][0]+xs[i][0][1]])
xs.append([[xs[i][0][1], ys[len(ys)-1][0]]])
i = i + 1
del xs[len(xs)-1]
xs = np.array(xs, dtype=float)
ys = np.array(ys, dtype=float)
# Create Model
model = keras.Sequential()
model.add(layers.LSTM(1, input_shape=(1, 2)))
model.add(layers.Dense(1))
model.compile(optimizer="adam", loss="mean_absolute_error", metrics=[ 'accuracy' ])
# Train
model.fit(xs, ys, epochs=100000)
You can't feed a NN data where some of the values are 10^21 times as large as some of the others and expect it to work, it just doesn't happen.
You're not doing anything here that actually calls for LSTM (or any RNN), you're not actually using the time dimension, and you're basically just trying to learn addition. Maybe you meant to do something different (like input digits as a sequence, or have the output run for multiple timesteps and give you several values of the sequence), but that's not what you're doing, and it's unclear what you want.
The number of units is your memory/procesing capacity. Each unit of an RNN is able to receive values from all of the units in the previous timestep. One unit alone can't do anything interesting, especially with no layer before it to preprocess the data.
I am trying to train a network on the Swiss Roll dataset with three features X = [x1, x2, x3] for the classification task. There are four classes with labels 1, 2, 3, 4, and the vector y contains the labels for all the data.
A row in the X matrix looks like this:
-5.2146470e+00 7.0879738e+00 6.7292474e+00
The shape of X is (100, 3), and the shape of y is (100,).
I want to use Radial Basis Functions to train this model. I have used the custom RBFLayer from this StackOverflow answer (also see this explanation) to build the RBFLayer. I want to use a couple of Keras Dense layers to build the network for classification.
What I have tried so far
I have used a Dense layer for the first layer, followed by the custom RBFLayer, and two other Dense layers. Here's the code:
model = Sequential()
model.add((Dense(100, input_dim=3)))
# number of units = 10, gamma = 0.05
model.add(RBFLayer(10,0.05))
model.add(Dense(15, activation='relu'))
model.add(Dense(1, activation='softmax'))
This model gives me zero accuracy. I think there is something wrong with the model architecture, but I can't figure out what is the issue.
Also, I thought the number of units in the last Dense layer should match the number of classes, which is 4 in this case. But when I set the number of units to 4 in the last layer, I get the following error:
ValueError: Shapes (None, 1) and (None, 4) are incompatible
Can you help me with this model architecture?
I faced the same issue while practicing with multi-class classification. Where I had 7 features and the model classifies into 7 classes. I tried encoding the labels and it fixed the issue.
First import LabelEncoder class from sklearn and import to_categorical from tensorflow
from sklearn.preprocessing import LabelEncoder
from tensorflow.keras.utils import to_categorical
Then, initialize an object to the LabelEncoder class and transform your labels before fitting and training the model.
encoder = LabelEncoder()
encoder.fit(y)
y = encoder.transform(y)
y = to_categorical(y)
Note that you have to use np.argmax for getting the actual predicted classification. in my case, the prediction is stored in variable called res
res = np.argmax(res, axis=None, out=None)
You can get your actual predicted class after this line. Looking forward to help you. Hope it solved your problem.
There are four classes with labels 1, 2, 3, 4, and the vector y contains the labels for all the data.
The simplest solution for input output matching is that you print the shape of the inputs and output for a single batch and then compare.
RBF layer should have no problem because output is taken from last dense layer rather then RBF layer.
With classification problem you must have last nodes equal to classes in regression the last node is 1 sometimes.
you should print
pseudo code
print(input.shape)
compare it with
print(model.input_shape)
then at output
print(output.shape)
then compare it with
print(model.predict(input).shape)
you can find the correct syntax at keras docs these are approx correct syntax / pseudo