Bert Text Classification Loss is Nan

Bert Text Classification Loss is Nan - python

I'm try to make an model that classify the text in 3 categories.(Negative,Neural,Positive)
I have csv file that contain comments on different apps with their rating.
First I import all the necessary libraries
!pip install transformers
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%tensorflow_version 2.x
import tensorflow as tf
from transformers import TFBertForSequenceClassification, BertTokenizer,DistilBertTokenizer,glue_convert_examples_to_features, InputExample,BertConfig,InputFeatures
from sklearn.model_selection import train_test_split
from tqdm import tqdm
%matplotlib inline
Then i'll get my csv file
!gdown --id 1S6qMioqPJjyBLpLVz4gmRTnJHnjitnuV
!gdown --id 1zdmewp7ayS4js4VtrJEHzAheSW-5NBZv
df = pd.read_csv("reviews.csv")
print(df[['content','score']].head())
content score
0 Update: After getting a response from the deve... 1
1 Used it for a fair amount of time without any ... 1
2 Your app sucks now!!!!! Used to be good but no... 1
3 It seems OK, but very basic. Recurring tasks n... 1
4 Absolutely worthless. This app runs a prohibit... 1
Converting scores to sentiment
def to_sentiment(rating):
rating = int(rating)
if rating <= 2:
return 0
elif rating == 3:
return 1
else:
return 2
df['sentiment'] = df.score.apply(to_sentiment)
tokenizer = BertTokenizer.from_pretrained('bert-base-cased',do_lower_case = True)
Creating Helper Methods to fit the data into model
def convert_example_to_feature(review):
return tokenizer.encode_plus(
review,
add_special_tokens=True,
max_length=160, # truncates if len(s) > max_length
return_token_type_ids=True,
return_attention_mask=True,
pad_to_max_length=True, # pads to the right by default
)
def map_example_to_dict(input_ids,attention_mask,token_type_ids,label):
return {
"input_ids": input_ids,
"attention_mask": attention_mask,
"token_type_ids" : token_type_ids
},label
def encode_examples(ds):
# prepare list, so that we can build up final TensorFlow dataset from slices.
input_ids_list = []
token_type_ids_list = []
attention_mask_list = []
label_list = []
for index, row in tqdm(ds.iterrows()):
bert_input = convert_example_to_feature(row['content'])
input_ids_list.append(bert_input['input_ids'])
token_type_ids_list.append(bert_input['token_type_ids'])
attention_mask_list.append(bert_input['attention_mask'])
label_list.append([row['sentiment']])
return tf.data.Dataset.from_tensor_slices((input_ids_list, attention_mask_list, token_type_ids_list, label_list)).map(map_example_to_dict)
df_train, df_test = train_test_split(df,test_size=0.1)
Creating Model
model = TFBertForSequenceClassification.from_pretrained('bert-base-cased')
optimizer = tf.keras.optimizers.Adam(learning_rate=2e-5, epsilon=1e-08)
loss = tf.keras.losses.SparseCategoricalCrossentropy()
metric = tf.keras.metrics.SparseCategoricalAccuracy('accuracy')
model.compile(optimizer=optimizer, loss=loss,metrics=metric)
history = model.fit(ds_train_encoded,epochs=1)
14/443 [..............................] - ETA: 3:58 - loss: nan - accuracy: 0.3438
If i change the count of the sentiment and make it just positive and negative then it works.
But with 3 or more labels creates this problem.

The label classes index should start from 0 not 1.
TFBertForSequenceClassification requires labels in the range [0,1,...]
labels (tf.Tensor of shape (batch_size,), optional, defaults to None)
– Labels for computing the sequence classification/regression loss.
Indices should be in [0, ..., config.num_labels - 1]. If
config.num_labels == 1 a regression loss is computed (Mean-Square
loss), If config.num_labels > 1 a classification loss is computed
(Cross-Entropy).
Source: https://huggingface.co/transformers/model_doc/bert.html#tfbertforsequenceclassification

Related

How to forecast a univariate time series 20/30 days ahead using tensorflow LSTM?

I have used below code for training and validation. It gives decent result but I don't know the code to forecast n periods ahead (like 30/50 days ahead) using the trained model.
GitHub Link for the code with data output is here.
Import the libraries:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import math
from keras.models import Sequential
from keras.layers import Dense,Dropout,Conv1D,Bidirectional
from keras.layers import LSTM
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
import warnings
warnings.filterwarnings('ignore')
import tensorflow
np.random.seed(1)
tensorflow.random.set_seed(1)
Load the univariate time series data and normalize the values:
dataframe=pd.read_sas('train.sas7bdat')
dataframe['Datetime']=pd.to_datetime(dataframe['Datetime'],format='%d%b%Y:%H:%M:%S')
dataframe.set_index('Datetime',inplace=True)
data=dataframe
#filter input data according to datetime i.e: 01th May 2020
dataset=data[data.index>='2021-05-01 00:00:00']
# Replcae null value with previous 15 minute value
dataset.ffill(axis ='rows',inplace=True)
dataset.shape
def normalize_cols(df,cols):
"""Scale the values of each feature
according to the columns max value"""
data = df.loc[:,cols]
for col in cols:
scaler = lambda x: x / data[col].max()
data[col] = data[col].apply(scaler)
print(data[cols].head())
return data[cols].values
features = df.columns.values # columns to train model on
X = normalize_cols(df,features)
Turn each signal into a labeled dataset:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
window_size = 15 #15 # num. days per training sample
batch_size = 64 # num. of samples per epoch
buffer_size = 1000 # num of samples in memory for random selection
split_time = 400 # where to split the data for training/validation
forecast_length=10*24*4
def window_dataset(series, window_size, batch_size, shuffle_buffer):
"""Funtion to turn time series data into set of sequences
where the last value is the intended output of our model"""
ser = tf.expand_dims(series, axis=-1)
data = tf.data.Dataset.from_tensor_slices(series)
data = data.window(window_size + 1, shift=1, drop_remainder=True)
data = data.flat_map(lambda w: w.batch(window_size + 1))
data = data.shuffle(shuffle_buffer)
data = data.map(lambda w: (w[:-1], w[1:]))
return data.batch(batch_size).prefetch(1)
x_train = X[:split_time]
x_test = X[split_time:]
print(f"Training data shape: {x_train.shape}")
print(f"Validation data shape: {x_test.shape}")
train_set = window_dataset(x_train,window_size,batch_size,buffer_size)
keras.backend.clear_session()
Choose and connect the model components:
# 1D convolution layers
conv1 = layers.Conv1D(
filters=60,kernel_size=15,strides=1,
padding="causal",activation="relu",
input_shape=[None,len(features)])
conv2 = layers.Conv1D(
filters=60,kernel_size=5,strides=1,
padding="causal",activation="tanh")
# Bidirectional LSTM layers
lstm1 = layers.Bidirectional(layers.LSTM(50,return_sequences=True))
lstm2 = layers.Bidirectional(layers.LSTM(20,return_sequences=True))
# Model construction
inputs = layers.Input(shape=(None,len(features)))
x = conv1(inputs)
x = lstm1(x)
x = lstm2(x)
x = conv2(x)
x = layers.Dense(60,activation='relu')(x)
x = layers.Dropout(.1)(x)
x = layers.Dense(1,activation='tanh')(x)
outputs = layers.Lambda(lambda x: 25*abs(x))(x)
#outputs = layers.Lambda(lambda x: 1*abs(x))(x)
# SGD optimizer and Huber loss
optimizer = keras.optimizers.SGD(lr=1e-5, momentum=0.9)
loss = keras.losses.Huber()
model = keras.Model(inputs=inputs,outputs=outputs)
model.compile(optimizer,loss,
metrics=["mae"])
model.summary()
"""
### Train model
"""
epochs = 100
history = model.fit(train_set, epochs=epochs, verbose=1)
print(f"Model trained for {epochs} epochs")
Inspect training results:
def model_forecast(model, X, window_size):
"""Takes in numpy array, creates a windowed tensor
and predicts the following value on each window"""
data = tf.data.Dataset.from_tensor_slices(X)
data = data.window(window_size, shift=1, drop_remainder=True)
data = data.flat_map(lambda w: w.batch(window_size))
data = data.batch(32).prefetch(1)
forecast = model.predict(data)
return forecast
train_window = [i for i in range(split_time-window_size)]
forecast = model_forecast(model,x_train,window_size)
import seaborn as sns
plt.figure(figsize=(8,5),dpi=120)
sns.lineplot(train_window,forecast[:-1,1,0].reshape(-1),label='Forecast') #forecast[:-1,1,0]
sns.lineplot(train_window,X[:split_time-window_size].reshape(-1),label='actual_load')
Make predictions on test data:
val_window = [i for i in range(split_time,len(df)-window_size)]
forecast = model_forecast(model,x_test,window_size)
plt.figure(figsize=(8,5),dpi=120)
sns.lineplot(val_window,forecast[:-1,1,0].reshape(-1),label='Forecast')
sns.lineplot(val_window,X[split_time:-window_size].reshape(-1),label='actual_load')

Training a BERT and Running out of memory - Google Colab

I keep running out of memory even after i bought google colab pro which has 25gb RAM usage. I have no idea why is this happening. I tried every kernel possible (Google colab, Google colab pro, Kaggle kernel, Amazon Sagemaker, Google Cloud Platform). I reduced my batch size to 8, no success whatsoever.
My goal is to train Bert in Deep Pavlov (with Russian text classification extension) to predict emotion of the tweet. It is a multiclass classification with 5 classes
Here is my whole code:
!pip3 install deeppavlov
import pandas as pd
train_df = pd.read_csv('train_pikabu.csv')
test_df = pd.read_csv('test_pikabu.csv')
val_df = pd.read_csv('validation_pikabu.csv')
from deeppavlov.dataset_readers.basic_classification_reader import BasicClassificationDatasetReader
# read data from particular columns of `.csv` file
data = BasicClassificationDatasetReader().read(
data_path='./',
train='train_pikabu.csv',
valid="validation_pikabu_a.csv",
test="test_pikabu.csv",
x = 'content',
y = 'emotions'
)
from deeppavlov.dataset_iterators.basic_classification_iterator import
BasicClassificationDatasetIterator
# initializing an iterator
iterator = BasicClassificationDatasetIterator(data, seed=42, shuffle=True)
!python -m deeppavlov install squad_bert
from deeppavlov.models.preprocessors.bert_preprocessor import BertPreprocessor
bert_preprocessor = BertPreprocessor(vocab_file="./bert/vocab.txt",
do_lower_case=False,
max_seq_length=256)
from deeppavlov.core.data.simple_vocab import SimpleVocabulary
vocab = SimpleVocabulary(save_path="./binary_classes.dict")
iterator.get_instances(data_type="train")
vocab.fit(iterator.get_instances(data_type="train")[1])
from deeppavlov.models.preprocessors.one_hotter import OneHotter
one_hotter = OneHotter(depth=vocab.len,
single_vector=True # means we want to have one vector per sample
)
from deeppavlov.models.classifiers.proba2labels import Proba2Labels
prob2labels = Proba2Labels(max_proba=True)
from deeppavlov.models.bert.bert_classifier import BertClassifierModel
from deeppavlov.metrics.accuracy import sets_accuracy
bert_classifier = BertClassifierModel(
n_classes=vocab.len,
return_probas=True,
one_hot_labels=True,
bert_config_file="./bert/bert_config.json",
pretrained_bert="./bert/bert_model.ckpt",
save_path="sst_bert_model/model",
load_path="sst_bert_model/model",
keep_prob=0.5,
learning_rate=1e-05,
learning_rate_drop_patience=5,
learning_rate_drop_div=2.0
)
# Method `get_instances` returns all the samples of particular data field
x_valid, y_valid = iterator.get_instances(data_type="valid")
# You need to save model only when validation score is higher than previous one.
# This variable will contain the highest accuracy score
best_score = 0.
patience = 2
impatience = 0
# let's train for 3 epochs
for ep in range(3):
nbatches = 0
for x, y in iterator.gen_batches(batch_size=8,
data_type="train", shuffle=True):
x_feat = bert_preprocessor(x)
y_onehot = one_hotter(vocab(y))
bert_classifier.train_on_batch(x_feat, y_onehot)
print("Batch done\n")
nbatches += 1
if nbatches % 1 == 0:
# validating every 100 batches
y_valid_pred = bert_classifier(bert_preprocessor(x_valid))
score = sets_accuracy(y_valid, vocab(prob2labels(y_valid_pred)))
print("Batches done: {}. Valid Accuracy: {}".format(nbatches, score))
y_valid_pred = bert_classifier(bert_preprocessor(x_valid))
score = sets_accuracy(y_valid, vocab(prob2labels(y_valid_pred)))
print("Epochs done: {}. Valid Accuracy: {}".format(ep + 1, score))
if score > best_score:
bert_classifier.save()
print("New best score. Saving model.")
best_score = score
impatience = 0
else:
impatience += 1
if impatience == patience:
print("Out of patience. Stop training.")
break
It runs up to 1 batch and then crushes.

How can I plot training accuracy, training loss with respect to epochs? I'm using pre-addestr Google bert

I am new to machine learning programming. I want to plot training accuracy, training loss, validation accuracy, and validation loss in following program.
I use some tutorials to do this, it work fine, but I want this graph
import tensorflow as tf
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))
# Commented out IPython magic to ensure Python compatibility.
# install
!pip install pytorch-pretrained-bert pytorch-nlp
!pip install awscli awsebcli botocore==1.18.18 --upgrade
# BERT imports
import torch
import keras
from torch.utils.data import TensorDataset, DataLoader, RandomSampler, SequentialSampler
from keras.preprocessing.sequence import pad_sequences
from sklearn.model_selection import train_test_split
from pytorch_pretrained_bert import BertTokenizer, BertConfig
from pytorch_pretrained_bert import BertAdam, BertForSequenceClassification
from keras.models import Sequential
from keras.layers import Dense
import matplotlib.pyplot as plt
import numpy
from tqdm import tqdm, trange
import pandas as pd
import io
import numpy as np
import matplotlib.pyplot as plt
# % matplotlib inline
# specify GPU device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
n_gpu = torch.cuda.device_count()
torch.cuda.get_device_name(0)
# Upload the train file from your local drive
from google.colab import files
uploaded = files.upload()
df = pd.read_csv("text.tsv", delimiter='\t', header=None, names=['sentence_source', 'sentence', 'label', 'label_notes'])
df.shape
df.sample(19)
# Create sentence and label lists
sentences = df.sentence.values
# We need to add special tokens at the beginning and end of each sentence for BERT to work properly
sentences = ["[CLS] " + sentence + " [SEP]" for sentence in sentences]
labels = df.label.values
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased', do_lower_case=True)
tokenized_texts = [tokenizer.tokenize(sent) for sent in sentences]
print ("Tokenize the first sentence:")
print (tokenized_texts[0])
# Set the maximum sequence length. The longest sequence in our training set is 47, but we'll leave room on the end anyway.
# In the original paper, the authors used a length of 512.
MAX_LEN = 128
# Use the BERT tokenizer to convert the tokens to their index numbers in the BERT vocabulary
input_ids = [tokenizer.convert_tokens_to_ids(x) for x in tokenized_texts]
# Pad our input tokens
input_ids = pad_sequences(input_ids, maxlen=MAX_LEN, dtype="long", truncating="post", padding="post")
# Create attention masks
attention_masks = []
# Create a mask of 1s for each token followed by 0s for padding
for seq in input_ids:
seq_mask = [float(i>0) for i in seq]
attention_masks.append(seq_mask)
# Use train_test_split to split our data into train and validation sets for training
train_inputs, validation_inputs, train_labels, validation_labels = train_test_split(input_ids, labels,
random_state=2018, test_size=0.1, stratify=labels)
train_masks, validation_masks, _, _ = train_test_split(attention_masks, input_ids,
random_state=2018, test_size=0.1, stratify=labels)
#stratify
# Convert all of our data into torch tensors, the required datatype for our model
train_inputs = torch.tensor(train_inputs)
validation_inputs = torch.tensor(validation_inputs)
train_labels = torch.tensor(train_labels)
validation_labels = torch.tensor(validation_labels)
train_masks = torch.tensor(train_masks)
validation_masks = torch.tensor(validation_masks)
# Select a batch size for training. For fine-tuning BERT on a specific task, the authors recommend a batch size of 16 or 32
batch_size = 32
# Create an iterator of our data with torch DataLoader. This helps save on memory during training because, unlike a for loop,
# with an iterator the entire dataset does not need to be loaded into memory
train_data = TensorDataset(train_inputs, train_masks, train_labels)
train_sampler = RandomSampler(train_data)
train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=batch_size)
validation_data = TensorDataset(validation_inputs, validation_masks, validation_labels)
validation_sampler = SequentialSampler(validation_data)
validation_dataloader = DataLoader(validation_data, sampler=validation_sampler, batch_size=batch_size)
"""### **TRAIN**"""
# Load BertForSequenceClassification, the pretrained BERT model with a single linear classification layer on top.
model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=5)
model.cuda()
param_optimizer = list(model.named_parameters())
no_decay = ['bias', 'gamma', 'beta']
optimizer_grouped_parameters = [
{'params': [p for n, p in param_optimizer if not any(nd in n for nd in no_decay)],
'weight_decay_rate': 0.01},
{'params': [p for n, p in param_optimizer if any(nd in n for nd in no_decay)],
'weight_decay_rate': 0.0}
]
# This variable contains all of the hyperparemeter information our training loop needs
optimizer = BertAdam(optimizer_grouped_parameters,
lr=2e-5,
warmup=.1)
# Function to calculate the accuracy of our predictions vs labels
def flat_accuracy(preds, labels):
pred_flat = np.argmax(preds, axis=1).flatten()
labels_flat = labels.flatten()
return np.sum(pred_flat == labels_flat) / len(labels_flat)
t = []
# Store our loss and accuracy for plotting
train_loss_set = []
# Number of training epochs (authors recommend between 2 and 4)
epochs = 1
# trange is a tqdm wrapper around the normal python range
for _ in trange(epochs, desc="Epoch"):
# Training
# Set our model to training mode (as opposed to evaluation mode)
model.train()
# Tracking variables
tr_loss = 0
nb_tr_examples, nb_tr_steps = 0, 0
# Train the data for one epoch
for step, batch in enumerate(train_dataloader):
# Add batch to GPU
batch = tuple(t.to(device) for t in batch)
# Unpack the inputs from our dataloader
b_input_ids, b_input_mask, b_labels = batch
# Clear out the gradients (by default they accumulate)
optimizer.zero_grad()
# Forward pass
loss = model(b_input_ids, token_type_ids=None, attention_mask=b_input_mask, labels=b_labels)
train_loss_set.append(loss.item())
# Backward pass
loss.backward()
# Update parameters and take a step using the computed gradient
optimizer.step()
# Update tracking variables
tr_loss += loss.item()
nb_tr_examples += b_input_ids.size(0)
nb_tr_steps += 1
print("Train loss: {}".format(tr_loss/nb_tr_steps))
# Validation
# Put model in evaluation mode to evaluate loss on the validation set
model.eval()
# Tracking variables
eval_loss, eval_accuracy = 0, 0
nb_eval_steps, nb_eval_examples = 0, 0
# Evaluate data for one epoch
for batch in validation_dataloader:
# Add batch to GPU
batch = tuple(t.to(device) for t in batch)
# Unpack the inputs from our dataloader
b_input_ids, b_input_mask, b_labels = batch
# Telling the model not to compute or store gradients, saving memory and speeding up validation
with torch.no_grad():
# Forward pass, calculate logit predictions
logits = model(b_input_ids, token_type_ids=None, attention_mask=b_input_mask)
# Move logits and labels to CPU
logits = logits.detach().cpu().numpy()
label_ids = b_labels.to('cpu').numpy()
tmp_eval_accuracy = flat_accuracy(logits, label_ids)
eval_accuracy += tmp_eval_accuracy
nb_eval_steps += 1
print("Validation Accuracy: {}".format(eval_accuracy/nb_eval_steps))
# plot training performance
plt.figure(figsize=(15,8))
plt.title("Training loss")
plt.xlabel("Batch")
plt.ylabel("Loss")
plt.plot(train_loss_set)
plt.show()```

You can use tensorboard within Google Colab
#this loads Tensorboard notebooks extension so it displays inline
%load_ext tensorboard
...
#This will show tensorboard before training begins and it will update as training continues
%tensorboard --logdir logs
...
#training code
Google's officialy guidelines are here: https://colab.research.google.com/github/tensorflow/tensorboard/blob/master/docs/tensorboard_in_notebooks.ipynb

Keras model predicting wrong values(accuracy: 0.0000e+00)

Que the cliche "This is my first Keras project", but alas, this is the truth. I apologize for any cringe beginner mistakes in advance.
How is the data setup
Column A: We capture the time a given train is scheduled to depart in 24-hour time format.
Column B: An integer representation of the given trains destination. EX California == 2, New York == 0
Column C: The track assigned to the given train.
Screenshot of data setup
GOAL
By using this data, can we predict the track number using the time and location.
Current Attempt
# multivariate one step problem
from numpy import array
from numpy import hstack
from numpy import insert
from numpy import zeros,newaxis
from numpy import reshape
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import LSTM
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.preprocessing.sequence import TimeseriesGenerator
import pandas as pd
file_name = "DATA_DUMP.csv"
destination = 0
departure = 1622
#extract values
raw_data = pd.read_csv(file_name)
data = raw_data
in_seq1 = array([data['TIME'].values])
in_seq2 = array([data['LOCATION'].values])
result = array([data['TRACK'][0:-1].values])
# reshape series
in_seq1 = in_seq1.reshape((in_seq1.shape[1],len(in_seq1)))
in_seq2 = in_seq2.reshape((in_seq2.shape[1],len(in_seq2)))
result = result.reshape((result.shape[1],len(result)))
dataset = hstack((in_seq1, in_seq2))
result = insert(result,0,0)
result = result.reshape((len(result),1))
# define generator
n_features = dataset.shape[1]
n_input = 1
generator = TimeseriesGenerator(dataset, result, length=n_input, batch_size=1)
for i in range(len(generator)):
x, y = generator[i]
print('%s => %s' % (x, y))
# define model
model = Sequential()
model.add(LSTM(100, activation='sigmoid', input_shape=(n_input, n_features)))
model.add(Dense(1))
model.compile(optimizer=Adam(lr=0.00001), loss='mse',metrics=['accuracy'])
# fit model
model.fit_generator(generator, steps_per_epoch=1, epochs=500, verbose=2)
# make a one step prediction out of sample
raw_array = array([1507,3]) #predict arrival at 15:07 destination 3, what track will it be?
x_input = array(raw_array).reshape((1,n_input,n_features))
yhat = model.predict(x_input, verbose=1)
print(yhat)
The Problem
Although my code runs, I am getting extremely inaccurate predictions. I'm assuming this is due to my large loss. Any help in getting this model up and running would be greatly appreciated.
Epoch 500/500
1/1 - 0s - loss: 424.2032 - accuracy: 0.0000e+00

Reproducable Pytorch Results & Random Seeds

I have a simple toy NN with Pytorch. I am setting all the seeds I can find in the docs as well as numpy random.
If I run the code below from top to bottom, the results appear to be reproducible.
BUT, if I run block 1 only once and then each time run block 2, the result changes (sometimes dramatically). I am unsure why this happens since the network is being re-initialized and optimizer reset each time.
I am using version 0.4.0
BLOCK #1
from __future__ import division
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import torch
import torch.utils.data as utils_data
from torch.autograd import Variable
from torch import optim, nn
from torch.utils.data import Dataset
import torch.nn.functional as F
from torch.nn.init import xavier_uniform_, xavier_normal_,uniform_
torch.manual_seed(123)
import random
random.seed(123)
from sklearn.datasets import load_boston
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
%matplotlib inline
cuda=True #set to true uses GPU
if cuda:
torch.cuda.manual_seed(123)
#load boston data from scikit
boston = load_boston()
x=boston.data
y=boston.target
y=y.reshape(y.shape[0],1)
#train and test
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.3, random_state=123, shuffle=False)
#change to tensors
x_train = torch.from_numpy(x_train)
y_train = torch.from_numpy(y_train)
#create dataset and use data loader
training_samples = utils_data.TensorDataset(x_train, y_train)
data_loader_trn = utils_data.DataLoader(training_samples, batch_size=64,drop_last=False)
#change to tensors
x_test = torch.from_numpy(x_test)
y_test = torch.from_numpy(y_test)
#create dataset and use data loader
testing_samples = utils_data.TensorDataset(x_test, y_test)
data_loader_test = utils_data.DataLoader(testing_samples, batch_size=64,drop_last=False)
#simple model
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
#all the layers
self.fc1 = nn.Linear(x.shape[1], 20)
xavier_uniform_(self.fc1.weight.data) #this is how you can change the weight init
self.drop = nn.Dropout(p=0.5)
self.fc2 = nn.Linear(20, 1)
def forward(self, x):
x = F.relu(self.fc1(x))
x= self.drop(x)
x = self.fc2(x)
return x
BLOCK #2
net=Net()
if cuda:
net.cuda()
# create a stochastic gradient descent optimizer
optimizer = optim.Adam(net.parameters())
# create a loss function (mse)
loss = nn.MSELoss(size_average=False)
# run the main training loop
epochs =20
hold_loss=[]
for epoch in range(epochs):
cum_loss=0.
cum_records_epoch =0
for batch_idx, (data, target) in enumerate(data_loader_trn):
tr_x, tr_y = data.float(), target.float()
if cuda:
tr_x, tr_y = tr_x.cuda(), tr_y.cuda()
# Reset gradient
optimizer.zero_grad()
# Forward pass
fx = net(tr_x)
output = loss(fx, tr_y) #loss for this batch
cum_loss += output.item() #accumulate the loss
# Backward
output.backward()
# Update parameters based on backprop
optimizer.step()
cum_records_epoch +=len(tr_x)
if batch_idx % 1 == 0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, cum_records_epoch, len(data_loader_trn.dataset),
100. * (batch_idx+1) / len(data_loader_trn), output.item()))
print('Epoch average loss: {:.6f}'.format(cum_loss/cum_records_epoch))
hold_loss.append(cum_loss/cum_records_epoch)
#training loss
plt.plot(np.array(hold_loss))
plt.show()

Possible Reason
Not knowing what the "sometimes dramatic differences" are, it is hard to answer for sure; but having different results when running [block_1 x1; block_2 x1] xN (read "running block_1 then block_2 once; and repeat both operations N times) and [block_1 x1; block_2 xN] x1 makes sense, given how pseudo-random number generators (PRNGs) and seeds work.
In the first case, you are re-initializing the PRNGs in block_1 after each block_2, so each of the N instances of block_2 will access the same sequence of pseudo-random numbers, seeded by each block_1 before.
In the second case, the PRNGs are initialized only once, by the single block_1 run. So each instance of block_2 will have different random values.
(For more on PRNGs and seeds, you could check: random.seed(): What does it do?)
Simplified Example
Let's suppose numpy/CUDA/pytorch are actually using a really poor PRNG, which only returns incremented values (i.e. PRNG(x_n) = PRNG(x_(n-1)) + 1, with x_0 = seed). If you seed this generator with 0, it will thus return 1 the first random() call, 2 the second call, etc.
Now let also simplifies your blocks for the sake of the example:
def block_1():
seed = 0
print("seed: {}".format(seed))
prng.seed(seed)
--
def block_2():
res = "random results:"
for i in range(4):
res += " {}".format(prng.random())
print(res)
Let's compare [block_1 x1; block_2 x1] xN and [block_1 x1; block_2 xN] x1 with N=3:
for i in range(3):
block_1()
block_2()
# > seed: 0
# > random results: 1 2 3 4
# > seed: 0
# > random results: 1 2 3 4
# > seed: 0
# > random results: 1 2 3 4
block_1()
for i in range(3):
block_2()
# > seed: 0
# > random results: 1 2 3 4
# > random results: 4 5 6 7
# > random results: 8 9 10 11

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Bert Text Classification Loss is Nan - python

Related

How to forecast a univariate time series 20/30 days ahead using tensorflow LSTM?

Training a BERT and Running out of memory - Google Colab

How can I plot training accuracy, training loss with respect to epochs? I'm using pre-addestr Google bert

Keras model predicting wrong values(accuracy: 0.0000e+00)

Reproducable Pytorch Results & Random Seeds

Categories

Resources