I was implementing Sharpness Aware Minimization (SAM) using Tensorflow. The algorithm is simplified as follows
Compute gradient using current weight W
Compute ε according to the equation in the paper
Compute gradient using the weights W + ε
Update model using gradient from step 3
I have implement step 1 and 2 already, but having trouble implementing step 3 according to the code below
def train_step(self, data, rho=0.05, p=2, q=2):
if (1 / p) + (1 / q) != 1:
raise tf.python.framework.errors_impl.InvalidArgumentError('p, q must be specified so that 1/p + 1/q = 1')
x, y = data
# compute first backprop
with tf.GradientTape() as tape:
y_pred = self(x, training=True)
loss = self.compiled_loss(y, y_pred)
trainable_vars = self.trainable_variables
gradients = tape.gradient(loss, trainable_vars)
# compute neighborhoods (epsilon_hat) from first backprop
trainable_w_plus_epsilon_hat = [
w + (rho * tf.sign(loss) * (tf.pow(tf.abs(g), q-1) / tf.math.pow(tf.norm(g, ord=q), q / p)))
for w, g in zip(trainable_vars, gradients)
]
### HOW TO SET TRAINABLE WEIGHTS TO `w_plus_epsilon_hat`?
#
# TODO:
# 1. compute gradient using trainable weights from `trainable_w_plus_epsilon_hat`
# 2. update `trainable_vars` using gradient from step 1
#
#########################################################
self.compiled_metrics.update_state(y, y_pred)
return {m.name: m.result() for m in self.metrics}
Is there anyway to compute gradient using trainable weights from trainable_w_plus_epsilon_hat?
Related
i am new to tensorflow2.9 and i have finished writing a function to realize linear regression. But I faced some problems when I want to visualize this function with tensorboard.I know how to record data, but I dont know how to generate a graph with tf.summary.trace_on
Here is my code.
def linear_regression_1():
writer = tf.summary.create_file_writer("./tmp/linear")
x = tf.random.normal(shape=[100, 1])
y_true = tf.matmul(x, [[0.8]]) + 0.7
weights = tf.Variable(initial_value=tf.random.normal(shape=[1, 1]))
bias = tf.Variable(initial_value=tf.random.normal(shape=[1, 1]))
optimizer = tf.keras.optimizers.SGD(learning_rate=0.01)
with writer.as_default():
for i in range(1000):
# tf.print('weights:', weights)
# tf.print('bias:', bias)
tf.summary.histogram('weights', weights, i)
tf.summary.histogram('bias', bias, i)
with tf.GradientTape() as tape:
y_predict = tf.matmul(x, weights) + bias
error = tf.reduce_mean(tf.square(y_predict - y_true))
tf.summary.histogram('error', error, i)
gradients = tape.gradient(error, [weights, bias])
optimizer.apply_gradients(zip(gradients, [weights, bias]))
print('weights:', weights)
print('bias:', bias)
linear_regression_1()
when I put a #tf.function before this function, this function just report errors.
There is a scalar function F with 1000 inputs. I want to train a model to predict F given the inputs. However, in the training dataset, we only know the derivative of F with respect to each input, not the value of F itself. How I can construct a neural network with this limitation in tensorflow or pytorch?
I think you can use torch.autograd to compute the gradients, and then use them for the loss. You need:
(a) A trainable nn.Module to represent the (unknown) function F:
class UnknownF(nn.Module):
def __init__(self, ...):
# whatever combinations of linear layers and activations and whatever...
def forward(self, x):
# x is 1000 dim vector
y = self.layers(x)
# y is a _scalar_ output
return y
model = UnknownF(...) # instansiate the model of the unknown function
(b) Training data:
x = torch.randn(n, 1000, requires_grad=True) # n examples of 1000-dim vectors
dy = torch.randn(n, 1000) # the corresponding n-dim gradients of the n inputs
(c) An optimizer:
opt = torch.optim.SGD(model.parameters(), lr=0.1)
(d) Put it together:
criterion = nn.MSELoss()
for e in range(num_epochs):
for i in range(n):
# batch size = 1, pick one example
x_ = x[i, :]
dy_ = dy[i, :]
opt.zero_grad()
# predict the unknown output
y_ = model(x_)
# compute the gradients of the model using autograd:
pred_dy_ = autograd.grad(y_, x_, create_graph=True)[0]
# compute the loss between the model's gradients and the GT ones:
loss = criterion(pred_dy_, dy_)
loss.backward()
opt.step() # update model's parameters accordingly.
I am implementing multinomial logistic regression using gradient descent + L2 regularization on the MNIST dataset.
My training data is a dataframe with shape (n_samples=1198, features=65).
On each iteration of gradient descent, I take a linear combination of the weights and inputs to obtain 1198 activations (beta^T * X). I then pass these activations through a softmax function. However, I am confused about how I would obtain a probability distribution over 10 output classes for each activation?
My weights are initialized as such
n_features = 65
# init random weights
beta = np.random.uniform(0, 1, n_features).reshape(1, -1)
This is my current implementation.
def softmax(x:np.ndarray):
exps = np.exp(x)
return exps/np.sum(exps, axis=0)
def cross_entropy(y_hat:np.ndarray, y:np.ndarray, beta:np.ndarray) -> float:
"""
Computes cross entropy for multiclass classification
y_hat: predicted classes, n_samples x n_feats
y: ground truth classes, n_samples x 1
"""
n = len(y)
return - np.sum(y * np.log(y_hat) + beta**2 / n)
def gd(X:pd.DataFrame, y:pd.Series, beta:np.ndarray,
lr:float, N:int, iterations:int) -> (np.ndarray,np.ndarray):
"""
Gradient descent
"""
n = len(y)
cost_history = np.zeros(iterations)
for it in range(iterations):
activations = X.dot(beta.T).values
y_hat = softmax(activations)
cost_history[it] = cross_entropy(y_hat, y, beta)
# gradient of weights
grads = np.sum((y_hat - y) * X).values
# update weights
beta = beta - lr * (grads + 2/n * beta)
return beta, cost_history
In Multinomial Logistic Regression, you need a separate set of parameters (the pixel weights in your case) for every class. The probability of an instance belonging to a certain class is then estimated as the softmax function of the instance's score for that class. The softmax function makes sure that the estimated probabilities sum to 1 over all classes.
I want to create a custom loss function is Keras, where the loss of the current prediction y_pred depends for the prediction of the previous training sample XXX and other parameters h, b, K. The loss function looks like the following and I don't know how to call the previous prediction (i.e. replace XXX) in Keras during training.
(for context: it's a loss function for quantile regression + fixed costs)
def custom_loss(y_true, y_pred, h, b, K, XXX):
if y_pred > XXX:
F = K
else:
F = 0
loss = h * max(0, y_pred - y_true) + b * max(0, y_true - y_pred) + F
return loss
Thanks for your help!
Let $F \in \mathbb{R}^{S \times F}$ be a matrix of features, I want to classify them using logistic regression with autograd [1]. The code I am using is similar to the one in the following example [2].
The only thing I want to change is that I have an additional weight matrix $W$ in $\mathbb{R}^{F \times L}$ that I want to apply to each feature. So each feature is multiplied with $W$ and then feed into the logistic regression.
Is it somehow possible to train $W$ and the weights of the logistic regression simultaneously using autograd?
I have tried the following code, unfortunately the weights stay at value 0.
import autograd.numpy as np
from autograd import grad
global inputs
def sigmoid(x):
return 0.5 * (np.tanh(x) + 1)
def logistic_predictions(weights, inputs):
# Outputs probability of a label being true according to logistic model.
return sigmoid(np.dot(inputs, weights))
def training_loss(weights):
global inputs
# Training loss is the negative log-likelihood of the training labels.
feature_weights = weights[3:]
feature_weights = np.reshape(feature_weights, (3, 3))
inputs = np.dot(inputs, feature_weights)
preds = logistic_predictions(weights[0:3], inputs)
label_probabilities = preds * targets + (1 - preds) * (1 - targets)
return -np.sum(np.log(label_probabilities))
# Build a toy dataset.
inputs = np.array([[0.52, 1.12, 0.77],
[0.88, -1.08, 0.15],
[0.52, 0.06, -1.30],
[0.74, -2.49, 1.39]])
targets = np.array([True, True, False, True])
# Define a function that returns gradients of training loss using autograd.
training_gradient_fun = grad(training_loss)
# Optimize weights using gradient descent.
weights = np.zeros([3 + 3 * 3])
print "Initial loss:", training_loss(weights)
for i in xrange(100):
print(i)
print(weights)
weights -= training_gradient_fun(weights) * 0.01
print "Trained loss:", training_loss(weights)
[1] https://github.com/HIPS/autograd
[2] https://github.com/HIPS/autograd/blob/master/examples/logistic_regression.py
Typical practice is to concatenate all "vectorized" parameters into the decision variables vector.
If you update logistic_predictions to include the W matrix, via something like
def logistic_predictions(weights_and_W, inputs):
'''
Here, :arg weights_and_W: is an array of the form [weights W.ravel()]
'''
# Outputs probability of a label being true according to logistic model.
weights = weights_and_W[:inputs.shape[1]]
W_raveled = weights_and_W[inputs.shape[1]:]
n_W = len(W_raveled)
W = W_raveled.reshape(inputs.shape[1], n_W/inputs.shape[1])
return sigmoid(np.dot(np.dot(inputs, W), weights))
then simply change traning_loss to (from the original source example)
def training_loss(weights_and_W):
# Training loss is the negative log-likelihood of the training labels.
preds = logistic_predictions(weights_and_W, inputs)
label_probabilities = preds * targets + (1 - preds) * (1 - targets)
return -np.sum(np.log(label_probabilities))