Computing gradient of the model with modified weights

Computing gradient of the model with modified weights - python

I was implementing Sharpness Aware Minimization (SAM) using Tensorflow. The algorithm is simplified as follows
Compute gradient using current weight W
Compute ε according to the equation in the paper
Compute gradient using the weights W + ε
Update model using gradient from step 3
I have implement step 1 and 2 already, but having trouble implementing step 3 according to the code below
def train_step(self, data, rho=0.05, p=2, q=2):
if (1 / p) + (1 / q) != 1:
raise tf.python.framework.errors_impl.InvalidArgumentError('p, q must be specified so that 1/p + 1/q = 1')
x, y = data
# compute first backprop
with tf.GradientTape() as tape:
y_pred = self(x, training=True)
loss = self.compiled_loss(y, y_pred)
trainable_vars = self.trainable_variables
gradients = tape.gradient(loss, trainable_vars)
# compute neighborhoods (epsilon_hat) from first backprop
trainable_w_plus_epsilon_hat = [
w + (rho * tf.sign(loss) * (tf.pow(tf.abs(g), q-1) / tf.math.pow(tf.norm(g, ord=q), q / p)))
for w, g in zip(trainable_vars, gradients)
]
### HOW TO SET TRAINABLE WEIGHTS TO `w_plus_epsilon_hat`?
#
# TODO:
# 1. compute gradient using trainable weights from `trainable_w_plus_epsilon_hat`
# 2. update `trainable_vars` using gradient from step 1
#
#########################################################
self.compiled_metrics.update_state(y, y_pred)
return {m.name: m.result() for m in self.metrics}
Is there anyway to compute gradient using trainable weights from trainable_w_plus_epsilon_hat?

Related

how to use tensorboard to visualize functions

i am new to tensorflow2.9 and i have finished writing a function to realize linear regression. But I faced some problems when I want to visualize this function with tensorboard.I know how to record data, but I dont know how to generate a graph with tf.summary.trace_on
Here is my code.
def linear_regression_1():
writer = tf.summary.create_file_writer("./tmp/linear")
x = tf.random.normal(shape=[100, 1])
y_true = tf.matmul(x, [[0.8]]) + 0.7
weights = tf.Variable(initial_value=tf.random.normal(shape=[1, 1]))
bias = tf.Variable(initial_value=tf.random.normal(shape=[1, 1]))
optimizer = tf.keras.optimizers.SGD(learning_rate=0.01)
with writer.as_default():
for i in range(1000):
# tf.print('weights:', weights)
# tf.print('bias:', bias)
tf.summary.histogram('weights', weights, i)
tf.summary.histogram('bias', bias, i)
with tf.GradientTape() as tape:
y_predict = tf.matmul(x, weights) + bias
error = tf.reduce_mean(tf.square(y_predict - y_true))
tf.summary.histogram('error', error, i)
gradients = tape.gradient(error, [weights, bias])
optimizer.apply_gradients(zip(gradients, [weights, bias]))
print('weights:', weights)
print('bias:', bias)
linear_regression_1()
when I put a #tf.function before this function, this function just report errors.

Train a neural network when the training has only the derivative of output wrt all inputs

There is a scalar function F with 1000 inputs. I want to train a model to predict F given the inputs. However, in the training dataset, we only know the derivative of F with respect to each input, not the value of F itself. How I can construct a neural network with this limitation in tensorflow or pytorch?

I think you can use torch.autograd to compute the gradients, and then use them for the loss. You need:
(a) A trainable nn.Module to represent the (unknown) function F:
class UnknownF(nn.Module):
def __init__(self, ...):
# whatever combinations of linear layers and activations and whatever...
def forward(self, x):
# x is 1000 dim vector
y = self.layers(x)
# y is a _scalar_ output
return y
model = UnknownF(...) # instansiate the model of the unknown function
(b) Training data:
x = torch.randn(n, 1000, requires_grad=True) # n examples of 1000-dim vectors
dy = torch.randn(n, 1000) # the corresponding n-dim gradients of the n inputs
(c) An optimizer:
opt = torch.optim.SGD(model.parameters(), lr=0.1)
(d) Put it together:
criterion = nn.MSELoss()
for e in range(num_epochs):
for i in range(n):
# batch size = 1, pick one example
x_ = x[i, :]
dy_ = dy[i, :]
opt.zero_grad()
# predict the unknown output
y_ = model(x_)
# compute the gradients of the model using autograd:
pred_dy_ = autograd.grad(y_, x_, create_graph=True)[0]
# compute the loss between the model's gradients and the GT ones:
loss = criterion(pred_dy_, dy_)
loss.backward()
opt.step() # update model's parameters accordingly.

Logistic regression from scratch

I am implementing multinomial logistic regression using gradient descent + L2 regularization on the MNIST dataset.
My training data is a dataframe with shape (n_samples=1198, features=65).
On each iteration of gradient descent, I take a linear combination of the weights and inputs to obtain 1198 activations (beta^T * X). I then pass these activations through a softmax function. However, I am confused about how I would obtain a probability distribution over 10 output classes for each activation?
My weights are initialized as such
n_features = 65
# init random weights
beta = np.random.uniform(0, 1, n_features).reshape(1, -1)
This is my current implementation.
def softmax(x:np.ndarray):
exps = np.exp(x)
return exps/np.sum(exps, axis=0)
def cross_entropy(y_hat:np.ndarray, y:np.ndarray, beta:np.ndarray) -> float:
"""
Computes cross entropy for multiclass classification
y_hat: predicted classes, n_samples x n_feats
y: ground truth classes, n_samples x 1
"""
n = len(y)
return - np.sum(y * np.log(y_hat) + beta**2 / n)
def gd(X:pd.DataFrame, y:pd.Series, beta:np.ndarray,
lr:float, N:int, iterations:int) -> (np.ndarray,np.ndarray):
"""
Gradient descent
"""
n = len(y)
cost_history = np.zeros(iterations)
for it in range(iterations):
activations = X.dot(beta.T).values
y_hat = softmax(activations)
cost_history[it] = cross_entropy(y_hat, y, beta)
# gradient of weights
grads = np.sum((y_hat - y) * X).values
# update weights
beta = beta - lr * (grads + 2/n * beta)
return beta, cost_history

In Multinomial Logistic Regression, you need a separate set of parameters (the pixel weights in your case) for every class. The probability of an instance belonging to a certain class is then estimated as the softmax function of the instance's score for that class. The softmax function makes sure that the estimated probabilities sum to 1 over all classes.

How to integrate previous predictions in Keras custom loss function?

I want to create a custom loss function is Keras, where the loss of the current prediction y_pred depends for the prediction of the previous training sample XXX and other parameters h, b, K. The loss function looks like the following and I don't know how to call the previous prediction (i.e. replace XXX) in Keras during training.
(for context: it's a loss function for quantile regression + fixed costs)
def custom_loss(y_true, y_pred, h, b, K, XXX):
if y_pred > XXX:
F = K
else:
F = 0
loss = h * max(0, y_pred - y_true) + b * max(0, y_true - y_pred) + F
return loss
Thanks for your help!

Linear regression and autograd

Let $F \in \mathbb{R}^{S \times F}$ be a matrix of features, I want to classify them using logistic regression with autograd [1]. The code I am using is similar to the one in the following example [2].
The only thing I want to change is that I have an additional weight matrix $W$ in $\mathbb{R}^{F \times L}$ that I want to apply to each feature. So each feature is multiplied with $W$ and then feed into the logistic regression.
Is it somehow possible to train $W$ and the weights of the logistic regression simultaneously using autograd?
I have tried the following code, unfortunately the weights stay at value 0.
import autograd.numpy as np
from autograd import grad
global inputs
def sigmoid(x):
return 0.5 * (np.tanh(x) + 1)
def logistic_predictions(weights, inputs):
# Outputs probability of a label being true according to logistic model.
return sigmoid(np.dot(inputs, weights))
def training_loss(weights):
global inputs
# Training loss is the negative log-likelihood of the training labels.
feature_weights = weights[3:]
feature_weights = np.reshape(feature_weights, (3, 3))
inputs = np.dot(inputs, feature_weights)
preds = logistic_predictions(weights[0:3], inputs)
label_probabilities = preds * targets + (1 - preds) * (1 - targets)
return -np.sum(np.log(label_probabilities))
# Build a toy dataset.
inputs = np.array([[0.52, 1.12, 0.77],
[0.88, -1.08, 0.15],
[0.52, 0.06, -1.30],
[0.74, -2.49, 1.39]])
targets = np.array([True, True, False, True])
# Define a function that returns gradients of training loss using autograd.
training_gradient_fun = grad(training_loss)
# Optimize weights using gradient descent.
weights = np.zeros([3 + 3 * 3])
print "Initial loss:", training_loss(weights)
for i in xrange(100):
print(i)
print(weights)
weights -= training_gradient_fun(weights) * 0.01
print "Trained loss:", training_loss(weights)
[1] https://github.com/HIPS/autograd
[2] https://github.com/HIPS/autograd/blob/master/examples/logistic_regression.py

Typical practice is to concatenate all "vectorized" parameters into the decision variables vector.
If you update logistic_predictions to include the W matrix, via something like
def logistic_predictions(weights_and_W, inputs):
'''
Here, :arg weights_and_W: is an array of the form [weights W.ravel()]
'''
# Outputs probability of a label being true according to logistic model.
weights = weights_and_W[:inputs.shape[1]]
W_raveled = weights_and_W[inputs.shape[1]:]
n_W = len(W_raveled)
W = W_raveled.reshape(inputs.shape[1], n_W/inputs.shape[1])
return sigmoid(np.dot(np.dot(inputs, W), weights))
then simply change traning_loss to (from the original source example)
def training_loss(weights_and_W):
# Training loss is the negative log-likelihood of the training labels.
preds = logistic_predictions(weights_and_W, inputs)
label_probabilities = preds * targets + (1 - preds) * (1 - targets)
return -np.sum(np.log(label_probabilities))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Computing gradient of the model with modified weights - python

Related

how to use tensorboard to visualize functions

Train a neural network when the training has only the derivative of output wrt all inputs

Logistic regression from scratch

How to integrate previous predictions in Keras custom loss function?

Linear regression and autograd

Categories

Resources