How to append predicted data from test loader every other one - python

I want to appended every other predicted images and real one in CNN code, but I am not sure how to implement it.
The code is as below:
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False,
num_workers=num_workers, pin_memory=True)
test_pred=[]
test_real=[]
model.eval()
with torch.no_grad():
for data in test_loader:
x_test_batch, y_test_batch = data[0].to(device,
dtype=torch.float), data[1].to(device, dtype=torch.float)
y_test_pred = model(x_test_batch)
mse_val_loss = criterion(y_test_batch, y_test_pred, x_test_batch, mse)
mae_val_loss = criterion(y_test_batch, y_test_pred, x_test_batch, l1loss)
mse_val_losses.append(mse_val_loss.item())
mae_val_losses.append(mae_val_loss.item())
N_test.append(len(x_test_batch))
test_pred.append(y_test_pred[::2])
test_real.append(y_test_batch[::2])

Related

Using model.fit with different data generators

I'm using this code to train my CNN:
history = model.fit(
x = train_gen1,
epochs = epochs,
validation_data = valid_gen,
callbacks = noaug_callbacks,
class_weight = class_weight
).history
where train_gen1 is made using ImageDataGenerator. But what if I want to use 2 different types of generator (let's call them train_gen1 and train_gen2) both feeding my training phase? How can I change my code in order to do so?
This is the way I generate:
aug_train_data_gen = ImageDataGenerator(rotation_range=0,
height_shift_range=40,
width_shift_range=40,
zoom_range=0,
horizontal_flip=True,
vertical_flip=True,
fill_mode='reflect',
preprocessing_function=preprocess_input
)
train_gen1 = aug_train_data_gen.flow_from_directory(directory=training_dir,
target_size=(96,96),
color_mode='rgb',
classes=None, # can be set to labels
class_mode='categorical',
batch_size=512,
shuffle= False #set to false if need to compare images
)

How to do both Data Augmentation and Cross Validation at the same time in NLP?

I have read somewhere that you should not use data augmentation on your validation set, and you should only use it on your training set.
My problem is this:
I have a dataset which has less number of training samples and I want to use data augmentation.
I split the dataset into training and test set and use data augmentation on the training set. I then use StratifiedKfold on the training set, which returns me a train index and a test index, but if I use the X_train[test index] as my validation set, it has some augmented images and I don't want that.
Is there any way to do data augmentation on the training set and do cross-validation ?
Here is my code(I haven't done data augmentation but would love to get a way to separate out the test_index from the augmented training samples.):
kfold = StratifiedKFold(n_splits=5,shuffle=True)
i=1
for train_index , test_index in kfold.split(X_train,y_train):
dataset_train = tf.data.Dataset.from_tensor_slices((X_train[train_index],
y_train.iloc[train_index])).shuffle(len(X_train[train_index]))
dataset_train = dataset_train.batch(512,drop_remainder=True).repeat()
dataset_test = tf.data.Dataset.from_tensor_slices((X_train[test_index],
y_train.iloc[test_index])).shuffle(len(X_train[test_index]))
dataset_test = dataset_test.batch(32,drop_remainder=True).take(steps_per_epoch).repeat()
model_1 = deep_neural()
print('--------------------------------------------------------------------------------------------
-------------------------------')
print('\n')
print(f'Training for fold {i} ...')
print('Training on {} samples.........Validating on {} samples'.format(len(X_train[train_index]),
len(X_train[test_index])))
checkpoint = tf.keras.callbacks.ModelCheckpoint(get_model_name(i),
monitor='val_loss', verbose=1,
save_best_only=True, mode='min')
history = model_1.fit(dataset_train,steps_per_epoch = len(X_train[train_index])//BATCH_SIZE,
epochs=4,validation_data=dataset_test,
validation_steps=1,callbacks=[csv_logger,checkpoint])
scores = model_1.evaluate(X_test,y_test,verbose=0)
pred_classes = model_1.predict(X_test).argmax(1)
f1score = f1_score(y_test,pred_classes,average='macro')
print('\n')
print(f'Score for fold {i}: {model_1.metrics_names[0]} of {scores[0]}; {model_1.metrics_names[1]}
of {scores[1]*100}; F1 Score of {f1score}%')
print('\n')
acc_per_fold.append(scores[1] * 100)
loss_per_fold.append(scores[0])
f1score_per_fold.append(f1score)
tf.keras.backend.clear_session()
gc.collect()
del model_1
i=i+1

implementation decision trees in tensorflow

feature_columns = []
for feature_name in train.columns.tolist() :
feature_columns.append(tf.feature_column.numeric_column(feature_name,dtype=tf.float32))
# Use entire batch since this is such a small dataset.
NUM_EXAMPLES = len(y_train)
def make_input_fn(X, y, n_epochs=None, shuffle=True):
def input_fn():
dataset = tf.data.Dataset.from_tensor_slices((dict(X), y))
if shuffle:
dataset = dataset.shuffle(NUM_EXAMPLES)
# For training, cycle thru dataset as many times as need (n_epochs=None).
dataset = dataset.repeat(n_epochs)
# In memory training doesn't use batching.
dataset = dataset.batch(NUM_EXAMPLES)
return dataset
return input_fn
# Training and evaluation input functions.
train_input_fn = make_input_fn(X_train, y_train)
eval_input_fn = make_input_fn(X_test, y_test, shuffle=False, n_epochs=1)
n_batches = 1
est = tf.estimator.BoostedTreesClassifier(feature_columns,
n_batches_per_layer=n_batches)
est.train(train_input_fn, max_steps=100)
result = est.evaluate(eval_input_fn)
result
built a decision tree model. like everything works, trains, checks for validation. but I just can't run the test sample (
test_input_fn = tf.data.Dataset.from_tensors(dict(X))
prediction = list(est.predict(test_input_fn))
example by which I studied
https://www.tensorflow.org/tutorials/estimator/boosted_trees
and this is where I read all sorts of parameters. I just can't figure out how to get predictions on the test sample
https://www.tensorflow.org/api_docs/python/tf/estimator/BoostedTreesClassifier
test_input_fn = make_input_fn(test, test.index, shuffle=False, n_epochs=1)
preds = est.predict(test_input_fn)
preds = [pred['class_ids'][0] for pred in preds]
pd.DataFrame({'PassengerId': dataTest.PassengerId, 'Survived':
preds}).to_csv('submission.csv', index=False)
!head submission.csv

How to split test and train data in a dataset based on number of targets in each category

I have an imageFolder in PyTorch which holds my categorized data images. Each folder is the name of the category and in the folder are images of that category.
I've loaded data and split train and test data via a sampler with random train_test_split. But the problem is my data distribution isn't good and some classes have lots of images and some classes have fewer.
So for solving this problem I want to choose 20% of each class as my test data and the rest would be the train data
ds = ImageFolder(filePath, transform=transform)
batch_size = 64
validation_split = 0.2
indices = list(range(len(ds))) # indices of the dataset
# TODO: fix spliting
train_indices,test_indices = train_test_split(indices,test_size=0.2)
# Creating PT data samplers and loaders:
train_sampler = SubsetRandomSampler(train_indices)
test_sampler = SubsetRandomSampler(test_indices)
train_loader = torch.utils.data.DataLoader(ds, batch_size=batch_size, sampler=train_sampler, num_workers=16)
test_loader = torch.utils.data.DataLoader(ds, batch_size=batch_size, sampler=test_sampler, num_workers=16)
Any idea of how I should fix it?
Use the stratify argument in train_test_split according to the docs. If your label indices is an array-like called y, do:
train_indices,test_indices = train_test_split(indices, test_size=0.2, stratify=y)
Try using StratifiedKFold or StratifiedShuffleSplit.
According to the docs:
This cross-validation object is a variation of KFold that returns stratified folds. The folds are made by preserving the percentage of samples for each class.
In your case you can try:
from sklearn.model_selection import StratifiedShuffleSplit
sss = StratifiedShuffleSplit(n_splits=5, test_size=0.5, random_state=0)
for train_index, test_index in sss.split(ds):
train = torch.utils.data.Subset(dataset, train_index)
test = torch.utils.data.Subset(dataset, test_index)
trainloader = torch.utils.data.DataLoader(train, batch_size=batch_size, shuffle=True, num_workers=0, pin_memory=False)
testloader = torch.utils.data.DataLoader(test, batch_size=batch_size, shuffle=True, num_workers=0, pin_memory=False)

Getting different result even after having the same subset of data

I have tried searching for possible similar questions, but I have not been able to find any so far. My problem is:
I want to use cross-validation using non-overlapping subsets of data using KFold. What I did was to create subsets using KFold and fix the outcome by setting randome_state to a certain integer. When I print out the subsets multiple times results look OK. However, the problem is when I use the same subsets on a model multiple times using model.predict (meaning running my code multiple times), I get different results. Naturally, I suspect there is something wrong with my implementation of training model. But I cannot figure out what it is. I would very much appreciate a hint. Here is my code:
random.seed(42)
# define K-fold cross validation test harness
kf = KFold(n_splits=3, random_state=42, shuffle=True)
for train_index, test_index in kf.split(data):l
print ('Train', train_index, '\nTest ', test_index)
# create model
testX= data[test_index]
trainX = data[train_index]
testYcheck = labels[test_index]
testP = Path[test_index]
# convert the labels from integers to vectors
trainY = to_categorical(labels[train_index], num_classes=2)
testY = to_categorical(labels[test_index], num_classes=2)
# construct the image generator for data augmentation
aug = ImageDataGenerator(rotation_range=30, width_shift_range=0.1,
height_shift_range=0.1, shear_range=0.2, zoom_range=0.2,
horizontal_flip=True, fill_mode="nearest")
# train the network
print("[INFO] training network...")
model.fit_generator(aug.flow(trainX, trainY, batch_size=BS),
validation_data=(testX, testY),
steps_per_epoch=len(trainX) // BS, epochs=EPOCHS, verbose=1)
#predict the test data
y_pred = model.predict(testX)
predYl = []
for element in range(len(y_pred)):
if y_pred[element,1] > y_pred[element,0]:
predYl.append(1)
else:
predYl.append(0)
pred_Y= np.array(predYl)
# Compute confusion matrix
cnf_matrix = confusion_matrix(testYcheck, pred_Y)
np.set_printoptions(precision=2)
print (cnf_matrix)

Categories

Resources