How to preprocess my ImageDataset using Keras (Augmentation, Split) - python

I have a project on object detection.
I have few data and want to apply the data augmentation method using Keras, but I am taking errors when I try to split and save my data into training and test.
How can I do all of this?
what I want to do?
First, I want to resize my image dataset then split data randomly into training and test.
After that saving into 'training' 'test' directory then I want to implement data augmentation for the training folder.
from tensorflow.keras.applications.xception import preprocess_input
from tensorflow.keras.preprocessing.image import ImageDataGenerator
data_dir=/..path/
ds_gen = ImageDataGenerator(
preprocessing_function=preprocess_input,
validation_split=0.2
)
train_ds = ds_gen.flow_from_directory(
"data_dir",
seed=1,
target_size=(150, 150), #adjust to your needs
batch_size=32,#adjust to your needs
save_to_dir= data_dir/training
subset='training'
)
val_ds = ds_gen.flow_from_directory(
"data_dir",
seed=1,
target_size=(150, 150),
batch_size=32,
save_to_dir= data_dir/validation
subset='validation'
)

I recommend using ImageDataGenerator.flow_from_dataframe to do what you wish. Since you are using flow from directory your data is organized so that the code below will read in the image information and create a train_df, a test_df and a valid_df set of data frames:
def preprocess (sdir, trsplit, vsplit, random_seed):
filepaths=[]
labels=[]
classlist=os.listdir(sdir)
for klass in classlist:
classpath=os.path.join(sdir,klass)
flist=os.listdir(classpath)
for f in flist:
fpath=os.path.join(classpath,f)
filepaths.append(fpath)
labels.append(klass)
Fseries=pd.Series(filepaths, name='filepaths')
Lseries=pd.Series(labels, name='labels')
df=pd.concat([Fseries, Lseries], axis=1)
# split df into train_df and test_df
dsplit=vsplit/(1-trsplit)
strat=df['labels']
train_df, dummy_df=train_test_split(df, train_size=trsplit, shuffle=True, random_state=random_seed, stratify=strat)
strat=dummy_df['labels']
valid_df, test_df=train_test_split(dummy_df, train_size=dsplit, shuffle=True, random_state=random_seed, stratify=strat)
print('train_df length: ', len(train_df), ' test_df length: ',len(test_df), ' valid_df length: ', len(valid_df))
print(train_df['labels'].value_counts())
return train_df, test_df, valid_df
sdir=/..path/
train_split=.8 # set this to the % of data you want for the train set
valid_split=.1 # set this to the % of the data you want for a validation set
# note % used for test is 1-train_split-valid_split
train_df, test_df, valid_df= preprocess(sdir,train_split, valid_split)
The function will show the balance between the classes in terms of how many sample there are in the training dataframe for each class. Examine this data and decide how on the number of samples you want in every class. For example is class0 has 3000 samples, class1 has 1200 samples and class2 has 800 samples you may decide that for the training dataframe you want to have every class have 1000 samples (max_samples=1000). That implies that for class 2 you have to create 200 augmented images, and for classes 0 and 1 you need to reduce the number of images. The functions below will do that for you.
The trim function trims the maximum number of samples in a class. The balance function use the trim function, then creates directories to store the augmented images, creates an aug_df dataframe and merges it with the train_df data frame. The result is a composite dataframe ndf that serves as the composite training set and is balanced with exactly max_samples of samples in each class.
def trim (df, max_size, min_size, column):
df=df.copy()
sample_list=[]
groups=df.groupby(column)
for label in df[column].unique():
group=groups.get_group(label)
sample_count=len(group)
if sample_count> max_size :
samples=group.sample(max_size, replace=False, weights=None, random_state=123, axis=0).reset_index(drop=True)
sample_list.append(samples)
elif sample_count>= min_size:
sample_list.append(group)
df=pd.concat(sample_list, axis=0).reset_index(drop=True)
balance=list(df[column].value_counts())
print (balance)
return df
def balance(train_df,max_samples, min_samples, column, working_dir, image_size):
train_df=train_df.copy()
train_df=trim (train_df, max_samples, min_samples, column)
# make directories to store augmented images
aug_dir=os.path.join(working_dir, 'aug')
if os.path.isdir(aug_dir):
shutil.rmtree(aug_dir)
os.mkdir(aug_dir)
for label in train_df['labels'].unique():
dir_path=os.path.join(aug_dir,label)
os.mkdir(dir_path)
# create and store the augmented images
total=0
gen=ImageDataGenerator(horizontal_flip=True, rotation_range=20, width_shift_range=.2,
height_shift_range=.2, zoom_range=.2)
groups=train_df.groupby('labels') # group by class
for label in train_df['labels'].unique(): # for every class
group=groups.get_group(label) # a dataframe holding only rows with the specified label
sample_count=len(group) # determine how many samples there are in this class
if sample_count< max_samples: # if the class has less than target number of images
aug_img_count=0
delta=max_samples-sample_count # number of augmented images to create
target_dir=os.path.join(aug_dir, label) # define where to write the images
aug_gen=gen.flow_from_dataframe( group, x_col='filepaths', y_col=None, target_size=image_size,
class_mode=None, batch_size=1, shuffle=False,
save_to_dir=target_dir, save_prefix='aug-', color_mode='rgb',
save_format='jpg')
while aug_img_count<delta:
images=next(aug_gen)
aug_img_count += len(images)
total +=aug_img_count
print('Total Augmented images created= ', total)
# create aug_df and merge with train_df to create composite training set ndf
if total>0:
aug_fpaths=[]
aug_labels=[]
classlist=os.listdir(aug_dir)
for klass in classlist:
classpath=os.path.join(aug_dir, klass)
flist=os.listdir(classpath)
for f in flist:
fpath=os.path.join(classpath,f)
aug_fpaths.append(fpath)
aug_labels.append(klass)
Fseries=pd.Series(aug_fpaths, name='filepaths')
Lseries=pd.Series(aug_labels, name='labels')
aug_df=pd.concat([Fseries, Lseries], axis=1)
ndf=pd.concat([train_df,aug_df], axis=0).reset_index(drop=True)
else:
ndf=train_df
print (list(ndf['labels'].value_counts()) )
return ndf
max_samples= 1000 # set this to how many samples you want in each class
min_samples=0
column='labels'
working_dir = r'./' # this is the directory where the augmented images will be stored
img_size=(224,224) # set this to the image size you want for the images
ndf=balance(train_df,max_samples, min_samples, column, working_dir, img_size)
now create the train, test and valid generators
channels=3
batch_size=30
img_shape=(img_size[0], img_size[1], channels)
length=len(test_df)
test_batch_size=sorted([int(length/n) for n in range(1,length+1) if length % n ==0 and length/n<=80],reverse=True)[0]
test_steps=int(length/test_batch_size)
print ( 'test batch size: ' ,test_batch_size, ' test steps: ', test_steps)
def scalar(img):
return img # EfficientNet expects pixelsin range 0 to 255 so no scaling is required
trgen=ImageDataGenerator(preprocessing_function=scalar, horizontal_flip=True)
tvgen=ImageDataGenerator(preprocessing_function=scalar)
train_gen=trgen.flow_from_dataframe( ndf, x_col='filepaths', y_col='labels', target_size=img_size, class_mode='categorical',
color_mode='rgb', shuffle=True, batch_size=batch_size)
test_gen=tvgen.flow_from_dataframe( test_df, x_col='filepaths', y_col='labels', target_size=img_size, class_mode='categorical',
color_mode='rgb', shuffle=False, batch_size=test_batch_size)
valid_gen=tvgen.flow_from_dataframe( valid_df, x_col='filepaths', y_col='labels', target_size=img_size, class_mode='categorical',
color_mode='rgb', shuffle=True, batch_size=batch_size)
classes=list(train_gen.class_indices.keys())
class_count=len(classes)
now use the train_gen and valid_gen in model.fit. Use the test_gen in model.evaluate or model.predict

Related

How to resize image tensors

The following is my code where I'm converting every image to PIL and then turning them into Pytorch tensors:
transform = transforms.Compose([transforms.PILToTensor()])
# choose the training and test datasets
train_data = os.listdir('data/training/')
testing_data = os.listdir('data/testing/')
train_tensors = []
test_tensors = []
for train_image in train_data:
img = Image.open('data/training/' + train_image)
train_tensors.append(transform(img))
for test_image in testing_data:
img = Image.open('data/testing/' + test_image)
test_tensors.append(transform(img))
# Print out some stats about the training and test data
print('Train data, number of images: ', len(train_data))
print('Test data, number of images: ', len(testing_data))
batch_size = 20
train_loader = DataLoader(train_tensors, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_tensors, batch_size=batch_size, shuffle=True)
# specify the image classes
classes = ['checked', 'unchecked', 'other']
# obtain one batch of training images
dataiter = iter(train_loader)
images, labels = dataiter.next()
images = images.numpy()
However, I am getting this error:
RuntimeError: stack expects each tensor to be equal size, but got [4, 66, 268] at entry 0 and [4, 88, 160] at entry 1
This is because my images are not resized prior to PIL -> Tensor. What is the correct way of resizing data images?
Try to utilize ImageFolder from torchvision, and assuming that images have diff size, you can use CenterCrop or RandomResizedCrop depending on your task. Check the Full list.
Here is an example:
train_dir = "data/training/"
train_dataset = datasets.ImageFolder(
train_dir,
transforms.Compose([
transforms.RandomResizedCrop(img_size), # image size int or tuple
# Add more transforms here
transforms.ToTensor(), # convert to tensor at the end
]))
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

AutoKeras Image Classifier: Generator not working and plt.show() gives empty image

I am trying to build an image classification program using AutoKeras, Tensorflow, and Pandas.
The code is as folllows:
from keras_preprocessing.image import ImageDataGenerator
import autokeras as ak
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
# directory with subfolders (that contain other subfolders) that contain images
data_dir = "/home/jack/project/"
# dataframe initialization
dataframe = pd.read_excel("/home/jack/project/pathsandlabels.xlsx")
# splitting the dataset
train_dataframe = dataframe.sample(frac=0.75, random_state=200)
test_dataframe = dataframe.drop(train_dataframe.index)
# Augmenting it
datagen = ImageDataGenerator(rescale=1./255., horizontal_flip=True, shear_range=0.6, zoom_range=0.4,
validation_split=0.25)
# Setting up a train generator
train_generator = datagen.flow_from_dataframe(
dataframe=train_dataframe,
directory="/home/jack/project",
x_col="filename",
y_col="assessment",
subset="training",
seed=42,
batch_size=16,
shuffle=True,
class_mode="binary",
target_size=(224, 224)
)
# setting up a validation generator
validation_generator = datagen.flow_from_dataframe(
dataframe=train_dataframe,
directory="/home/jack/project/",
x_col="filename",
y_col="assessment",
subset="validation",
batch_size=16,
seed=42,
shuffle=True,
class_mode="binary",
target_size=(224, 224)
)
# Another augmentation but for test data
test_gen = ImageDataGenerator(rescale=1./255.)
# test generator set up
test_generator = test_gen.flow_from_dataframe(
dataframe=test_dataframe,
directory="/home/jack/project/",
x_col="filename",
y_col=None,
batch_size=16,
seed=42,
shuffle=False,
class_mode=None,
target_size=(224, 224)
)
# this function will yield the variables we need to work with in order to create a train and test set
# it will iterate through the generator
def my_iterator(generator):
for img_batch, targets_batch in generator:
yield test_generator.batch_size, targets_batch
# Train and Validation set creation
# The first problem is here
# 1: Invalid argument: Value Error: 'generator' yielded an element of shape (16,224,224,3) where an element
# of shape (224,) was expected.
train_set = tf.data.Dataset.from_generator(lambda: my_iterator(train_generator), output_shapes=(224, 244),
output_types=(tf.float32, tf.float32))
val_set = tf.data.Dataset.from_generator(lambda: my_iterator(validation_generator), output_shapes=(224, 224),
output_types=(tf.float32, tf.float32))
# we check the output of both validation and train sets
print(train_set)
print(val_set)
# This piece of code is where the other two issues are:
# 2: squeeze(axis=2) gives this error: ValueError: cannot select an axis to squeeze out which has size not equal to one
# 3: Issue 2 can be averted by setting axis=None, but the next problem is plt.show() gives an empty image.
for image, label in train_set.take(1):
print("Image shape: ", image.numpy.shape())
print("Label: ", label.numpy.shape())
plt.imshow(image.numpy()[0].squeeze(axis=2) * 255)
plt.show()
clf = ak.ImageClassifier(overwrite=True, max_trials=1, seed=5)
clf.fit(x=train_set, epochs=20)
print(clf.evaluate(val_set))
I mentioned the issues I face as comments in the code, but I will explain again.
The biggest issue is the first one:Value Error: 'generator' yielded an element of shape (16,224,224,3) where an element of shape (224,) was expected. This happens when I try to initialize my test set.
What I tried:
Changing output_shape to (224,224,3) and (16,224,224,3) (didn't help, threw a different error saying that "The two sequences do not have the same length"
Deleting batch_size from train_generator (this set it back to the default 32 which my pc can't handle)
Changing target_size within the generators to (224,224,3) and (16,224,224,3). didn't work
Changing the number of variables that my_iterator yields. Didn't work (error message: expect n (this is either 3 or 4) values to unpack, got 2)
Changing batch_size to a number by which the total number of images can be divided by (didn't work, throws original error message)
How the data is stored:
Excel. Single sheet. Two columns, A and B. filename and assessment being the column names. Filename is paths to the images (e.g "/subfolder/subfolder/subfolder/A2c3jc3291n.jpeg") but without the quotes obviously.
Assessments are the classes. There are only two in this case.

Custom Datagenerator

I have a custom file containing the paths to all my images and their labels which I load in a dataframe using:
MyIndex=pd.read_table('./MySet.txt')
MyIndex has two columns of interest ImagePath and ClassName
Next I do some train test split and encoding the output labels as:
images=[]
for index, row in MyIndex.iterrows():
img_path=basePath+row['ImageName']
img = image.load_img(img_path, target_size=(299, 299))
img_path=None
img_data = image.img_to_array(img)
img=None
images.append(img_data)
img_data=None
images[0].shape
Classes=Sample['ClassName']
OutputClasses=Classes.unique().tolist()
labels=Sample['ClassName']
images=np.array(images, dtype="float") / 255.0
(trainX, testX, trainY, testY) = train_test_split(images,labels, test_size=0.10, random_state=42)
trainX, valX, trainY, valY = train_test_split(trainX, trainY, test_size=0.10, random_state=41)
images=None
labels=None
encoder = LabelEncoder()
encoder=encoder.fit(OutputClasses)
encoded_Y = encoder.transform(trainY)
# convert integers to dummy variables (i.e. one hot encoded)
trainY = to_categorical(encoded_Y, num_classes=len(OutputClasses))
encoded_Y = encoder.transform(valY)
# convert integers to dummy variables (i.e. one hot encoded)
valY = to_categorical(encoded_Y, num_classes=len(OutputClasses))
encoded_Y = encoder.transform(testY)
# convert integers to dummy variables (i.e. one hot encoded)
testY = to_categorical(encoded_Y, num_classes=len(OutputClasses))
datagen=ImageDataGenerator(rotation_range=90,horizontal_flip=True,vertical_flip=True,width_shift_range=0.25,height_shift_range=0.25)
datagen.fit(trainX,augment=True)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
batch_size=128
model.fit_generator(datagen.flow(trainX,trainY,batch_size=batch_size), epochs=500,
steps_per_epoch=trainX.shape[0]//batch_size,validation_data=(valX,valY))
The problem I face that the data loaded in one go is too large to fit in current machine memory and so I am unable to work with the complete dataset.
I have tried to work with the datagenerator but do not want to follow he directory conventions it follows and also cannot eradicate the augmentation part.
The question is that is there a way to load batches from the disk ensuring the two stated conditions.
I believe you should have a look at this post
What you are looking for is Keras flow_from_dataframe that let you load the batches from disk by providing the names of your files and their labels in a dataframe and also providing a top directory path that contains all your images.
Making a bit of midifications in your code and borrowing some from the link shared:
MyIndex=pd.read_table('./MySet.txt')
Classes=MyIndex['ClassName']
OutputClasses=Classes.unique().tolist()
trainDf=MyIndex[['ImageName','ClassName']]
train, test = train_test_split(trainDf, test_size=0.10, random_state=1)
#creating a data generator to load the files on runtime
traindatagen=ImageDataGenerator(rotation_range=90,horizontal_flip=True,vertical_flip=True,width_shift_range=0.25,height_shift_range=0.25,
validation_split=0.1)
train_generator=traindatagen.flow_from_dataframe(
dataframe=train,
directory=basePath,#the directory containing all your images
x_col='ImageName',
y_col='ClassName',
class_mode='categorical',
target_size=(299, 299),
batch_size=batch_size,
subset='training'
)
#Also a generator for the validation data
val_generator=traindatagen.flow_from_dataframe(
dataframe=train,
directory=basePath,#the directory containing all your images
x_col='ImageName',
y_col='ClassName',
class_mode='categorical',
target_size=(299, 299),
batch_size=batch_size,
subset='validation'
)
STEP_SIZE_TRAIN=train_generator.n//train_generator.batch_size
STEP_SIZE_VALID=val_generator.n//val_generator.batch_size
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit_generator(generator=train_generator, steps_per_epoch=STEP_SIZE_TRAIN,
validation_data=val_generator,
validation_steps=STEP_SIZE_VALID,
epochs=500)
Also note now you do not need the encoding of the labels as you had in your original code and also omit the image loading code.
I have not tried this code itself so try to fix any bugs you may encounter, as the primary focus was to deliver you the basic idea.
In response to your comment:
If you have all files in different directories then one solution would be to have your ImagesName to store the relative path including the intermediate directory in path something like './Dir/File.jpg' and then move all the directories to one folder and use the one as base path and everything else stays the same.
Also looking at your code segment that loaded the files look like you already have file paths stored in ImageName column so the suggested approach should work for you.
images=[]
for index, row in MyIndex.iterrows():
img_path=basePath+row['ImageName']
img = image.load_img(img_path, target_size=(299, 299))
img_path=None
img_data = image.img_to_array(img)
img=None
images.append(img_data)
img_data=None
In case if still some ambiguity exists feel free to ask again.
I think the simplest way to do this would be to just load part of your images per each generator and repeatedly call .fit_generator() with that smaller batch.
This example uses `random.random()` to choose which images to load – you could use something more sophisticated.
The previous version used random.random(), but we can just as well use a start index and page size like in this revised version to loop over the list of images forever.
import itertools
def load_images(start_index, page_size):
images = []
for index in range(page_size):
# Generate index using modulo to loop over the list forever
index = (start_index + index) % len(rows)
row = MyIndex[index]
img_path = basePath + row["ImageName"]
img = image.load_img(img_path, target_size=(299, 299))
img_data = image.img_to_array(img)
images.append(img_data)
return images
def generate_datagen(batch_size, start_index, page_size):
images = load_images(start_index, page_size)
# ... everything else you need to get from images to trainX and trainY, etc. here ...
datagen = ImageDataGenerator(
rotation_range=90,
horizontal_flip=True,
vertical_flip=True,
width_shift_range=0.25,
height_shift_range=0.25,
)
datagen.fit(trainX, augment=True)
return (
trainX,
trainY,
valX,
valY,
datagen.flow(trainX, trainY, batch_size=batch_size),
)
model.compile(
loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"]
)
page_size = (
500
) # load 500 images at a time; change this as suitable for your memory condition
for page in itertools.count(): # Count from zero to forever.
batch_size = 128
trainX, trainY, valX, valY, generator = generate_datagen(
128, page * page_size, page_size
)
model.fit_generator(
generator,
epochs=5,
steps_per_epoch=trainX.shape[0] // batch_size,
validation_data=(valX, valY),
)
# TODO: add a `break` clause with a suitable condition
If you want to load from the disk it is convenient to do with ImageDataGenerator that you used.
There are two ways to do it. By stating the directory of the data with flow_from_directory. Alternatively you can use flow_from_dataframe with Pandas dataframe
If you want to have a list of paths you should not use a custom generator that yields batches of images. Here is a stub:
def load_image_from_path(path):
"Loading and preprocessing"
...
def my_generator():
length = df.shape[0]
for i in range(0, length, batch_size)
batch = df.loc[i:min(i+batch_size, length-1)]
x, y = map(load_image_from_path, batch['ImageName']), batch['ClassName']
yield x, y
Note: in fit_generator there is an additional generator named validation_data for well you guessed it - validation.
One option is to pass the generators the indices to choose from in order to split train and test (assuming the data is shuffled, if not check this out).

How to deal with thousands of images for CNN training Keras

I have ~10000k images that cannot fit in memory. So for now I can only read 1000 images and train on it...
My code is here :
img_dir = "TrainingSet" # Enter Directory of all images
image_path = os.path.join(img_dir+"/images",'*.bmp')
files = glob.glob(image_path)
images = []
masks = []
contours = []
indexes = []
files_names = []
for f1 in np.sort(files):
img = cv2.imread(f1)
result = re.search('original_cropped_(.*).bmp', str(f1))
idx = result.group(1)
mask_path = img_dir+"/masks/mask_cropped_"+str(idx)+".bmp"
mask = cv2.imread(mask_path,0)
contour_path = img_dir+"/contours/contour_cropped_"+str(idx)+".bmp"
contour = cv2.imread(contour_path,0)
indexes.append(idx)
images.append(img)
masks.append(mask)
contours.append(contour)
train_df = pd.DataFrame({"id":indexes,"masks": masks, "images": images,"contours": contours })
train_df.sort_values(by="id",ascending=True,inplace=True)
print(train_df.shape)
img_size_target = (256,256)
ids_train, ids_valid, x_train, x_valid, y_train, y_valid, c_train, c_valid = train_test_split(
train_df.index.values,
np.array(train_df.images.apply(lambda x: cv2.resize(x,img_size_target).reshape(img_size_target[0],img_size_target[1],3))),
np.array(train_df.masks.apply(lambda x: cv2.resize(x,img_size_target).reshape(img_size_target[0],img_size_target[1],1))),
np.array(train_df.contours.apply(lambda x: cv2.resize(x,img_size_target).reshape(img_size_target[0],img_size_target[1],1))),
test_size=0.2, random_state=1337)
#Here we define the model architecture...
#.....
#End of model definition
# Training
optimizer = Adam(lr=1e-3,decay=1e-10)
model.compile(loss="binary_crossentropy", optimizer=optimizer, metrics=["accuracy"])
early_stopping = EarlyStopping(patience=10, verbose=1)
model_checkpoint = ModelCheckpoint("./keras.model", save_best_only=True, verbose=1)
reduce_lr = ReduceLROnPlateau(factor=0.5, patience=5, min_lr=0.00001, verbose=1)
epochs = 200
batch_size = 32
history = model.fit(x_train, y_train,
validation_data=[x_valid, y_valid],
epochs=epochs,
batch_size=batch_size,
callbacks=[early_stopping, model_checkpoint, reduce_lr])
What I would like to know is how can I modify my code in order to do batches of a small set of images without loading all the other 10000 into memory ? which means that the algorithm will read X images each epoch from directory and train on it and after that goes for the next X until the last one.
X here would be a reasonable amount of images that can fit into memory.
use fit_generator instead of fit
def generate_batch_data(num):
#load X images here
return images
model.fit_generator(generate_batch_data(X),
samples_per_epoch=10000, nb_epoch=10)
Alternative you could use train_on_batch instead of fit
Discussion on GitHub about this topic: https://github.com/keras-team/keras/issues/2708
np.array(train_df.images.apply(lambda x:cv2.resize(x,img_size_target).reshape(img_size_target[0],img_size_target[1],3)))
You can first apply this filter (and the 2 others) to each individual file and save them to a special folder (images_prepoc, masks_preproc,etc.. ) in a separate script, then load them back already ready for use in the current script.
Assuming that the actual images dimensions are greater than 256x256, you will have a faster algorithm, using less memory at the cost of a single preparation phase.

predict_generator and class labels

I am using ImageDataGenerator to generate new augmented images and extract bottleneck features from pretrained model but most of the tutorial I see on keras
samples same no of training samples as number of images in directory.
train_generator = train_datagen.flow_from_directory(
train_path,
target_size=image_size,
shuffle = "false",
class_mode='categorical',
batch_size=1)
bottleneck_features_train = model.predict_generator(
train_generator, 2* nb_train_samples // batch_size)
Suppose I want 2 times more images from the above code, how I can get the desired class labels for the features extracted from bottleneck layer which are stored in tuple train_generator.
shouldnt the code in training_generator.py at line 422
x, _ = generator_output
do something like this
=> x, y = generator_output
and return tuple [np.concatenate(out) for out in all_outs],y from predict_generator
i.e return the corresponding class labels along with the predicted features all_outs since there is no way to get the corresponding labels without running generator twice.
If you're using predict, normally you simply don't want Y, because Y will be the result of the prediction. (You're not training, so you don't need the true labels)
But you can do it yourself:
bottleneck = []
labels = []
for i in range(2 * nb_train_samples // batch_size):
x, y = next(train_generator)
bottleneck.append(model.predict(x))
labels.append(y)
bottleneck = np.concatenate(bottleneck)
labels = np.concatenate(labels)
If you want it with indexing (if your generator supports that):
#...
for epoch in range(2):
for i in range(nb_train_samples // batch_size):
x,y = train_generator[i]
#...

Categories

Resources