I'm trying to classify several images at once in one directory using InceptionV3. I was able to extract bottleneck features using the flow from directory method and get the results of the predictions as an array.
My question is how do I know which image each prediction refers to, since they are out of order in the array. I tried to extract classes using generator_top.classes, but it takes them from the name of the directory where the images are.
I see that the predictions are all correct (those objects that are on the images, those are obtained in the original array).
I would only understand how to compare them.
In addition, I used this approach when testing images on a large test sample (there were folders with each class and images in them) and all the predictions in the array were displayed in order of folders in test dir
but when I try to do this from the same directory with different classes, I cannot compare the predictions with the images.
from keras import applications
base_model = applications.InceptionV3(include_top=False, weights='imagenet')
res_model = Sequential()
res_model.add(GlobalAveragePooling2D(input_shape=train_data.shape[1:]))
res_model.add(Dense(17, activation='softmax'))
datagen=ImageDataGenerator(rescale=1./255)
generator = datagen.flow_from_directory(
test_path,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode=None,
shuffle=False)
nb_test_samples = len(generator.filenames) #17
predict_size_test = int(math.ceil(nb_test_samples/batch_size)) #2
bottleneck_features_test = base_model.predict_generator( generator,
predict_size_test, verbose=1)
np.save('bottleneck_features_test_10000inc.npy', bottleneck_features_test)
generator = datagen.flow_from_directory(
test_path,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode=None,
shuffle=False)
nb_test_samples = len(generator_top.filenames)
test_data = np.load('bottleneck_features_test_10000inc.npy')
test_labels = generator_top.classes
test_labels = to_categorical(test_labels, num_classes=17)
tr_predictions=[np.argmax(res_model.predict(np.expand_dims(feature,axis=0)))for feature in test_data]
You can get the filenames of the files generated, and map it with the predicted results.
datagen = ImageDataGenerator()
gen = datagen.flow_from_directory(...)
for i in gen:
idx = (gen.batch_index - 1) * gen.batch_size
filenames = gen.filenames[idx : idx + gen.batch_size]
The filenames array will have the filenames, use functions like zip in python, to map the filenames and the predicted results.
Note 1 - This will work when
shuffle=False
in the generator, because the data may be shuffled after reading and you wont be able to track the shuffle.
Note 2 - this answer is inspired from
Keras flowFromDirectory get file names as they are being generated.
Related
I'm trying to build an image classification model. It's a 4 class image classification. Here is my code for building image generators and running the training:
train_datagen = ImageDataGenerator(rescale=1./255.,
rotation_range=30,
horizontal_flip=True,
validation_split=0.1)
train_generator = image_gen.flow_from_directory(train_dir, target_size=(299, 299),
class_mode='categorical', batch_size=20,
subset='training')
validation_generator = image_gen.flow_from_directory(train_dir, target_size=(299, 299),
class_mode='categorical', batch_size=20,
subset='validation')
model.compile(Adam(learning_rate=0.001), loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit_generator(train_generator, steps_per_epoch=int(440/20), epochs=20,
validation_data=validation_generator,
validation_steps=int(42/20))
I was able to get train and validation work perfectly because the images in train directory are stored in a separate folder for each class. But, as you can see below, the test directory has 100 images and no folders inside it. It also doesn't have any labels and only contains image files.
How can I do prediction on the image files in test folder using Keras?
If you are interested to only perform prediction, you can load the images by a simple hack like this:
test_datagen = ImageDataGenerator(rescale=1/255.)
test_generator = test_datagen('PATH_TO_DATASET_DIR/Dataset',
# only read images from `test` directory
classes=['test'],
# don't generate labels
class_mode=None,
# don't shuffle
shuffle=False,
# use same size as in training
target_size=(299, 299))
preds = model.predict_generator(test_generator)
You can access test_generator.filenames to get a list of corresponding filenames so that you can map them to their corresponding prediction.
Update (as requested in comments section): if you want to map predicted classes to filenames, first you must find the predicted classes. If your model is a classification model, then probably it has a softmax layer as the classifier. So the values in preds would be probabilities. Use np.argmax method to find the index with highest probability:
preds_cls_idx = preds.argmax(axis=-1)
So this gives you the indices of predicted classes. Now we need to map indices to their string labels (i.e. "car", "bike", etc.) which are provided by training generator in class_indices attribute:
import numpy as np
idx_to_cls = {v: k for k, v in train_generator.class_indices.items()}
preds_cls = np.vectorize(idx_to_cls.get)(preds_cls_idx)
filenames_to_cls = list(zip(test_generator.filenames, preds_cls))
your folder structure be like testfolder/folderofallclassfiles
you can use
test_generator = test_datagen.flow_from_directory(
directory=pred_dir,
class_mode=None,
shuffle=False
)
before prediction i would also use reset to avoid unwanted outputs
EDIT:
For your purpose you need to know which image is associated with which prediction. The problem is that the data-generator start at different positions in the dataset each time we use the generator, thus giving us different outputs everytime. So, in order to restart at the beginning of the dataset in each call to predict_generator() you would need to exactly match the number of iterations and batches to the dataset-size.
There are multiple ways to encounter this
a) You can see the internal batch-counter using batch_index of generator
b) create a new data-generator before each call to predict_generator()
c) there is a better and simpler way, which is to call reset() on the generator, and if you have set shuffle=False in flow_from_directory then it should start over from the beginning of the dataset and give the exact same output each time, so now the ordering of testgen.filenames and testgen.classes matches
test_generator.reset()
Prediction
prediction = model.predict_generator(test_generator,verbose=1,steps=numberofimages/batch_size)
To map the filename with prediction
predict_generator gives output in probabilities so at first we need to convert them to class number like 0,1..
predicted_class = np.argmax(prediction,axis=1)
next step would be to convert those class number into actual class names
l = dict((v,k) for k,v in training_set.class_indices.items())
prednames = [l[k] for k in predicted_classes]
getting filenames
filenames = test_generator.filenames
Finally creating df
finaldf = pd.DataFrame({'Filename': filenames,'Prediction': prednames})
I am using data augmentation for a model and would like to include the original unaugmented images as well as the augmented images, in training.
I have used the following code so far:
main_dir = "____" (file directory)
train_dir = os.path.join(main_dir, 'training_set')
validation_dir = os.path.join(main_dir, 'validation_set')
test_dir = os.path.join(main_dir, 'test_set')
conv_base = VGG16(weights='imagenet',include_top=False,input_shape=(150, 150, 3))
conv_base.summary()
model = models.Sequential()
model.add(conv_base)
model.add(layers.Flatten())
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
model.summary()
conv_base.trainable = True
model.summary()
train_datagen = ImageDataGenerator(rescale=1./255,rotation_range=40,width_shift_range=0.2,height_shift_range=0.2,
shear_range=0.2,zoom_range=0.2,horizontal_flip=True,fill_mode='nearest')
test_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(train_dir,target_size=(150, 150),batch_size=20,class_mode='binary')
validation_generator = test_datagen.flow_from_directory(validation_dir,target_size=(150, 150),batch_size=20,class_mode='binary')
test_generator = test_datagen.flow_from_directory(test_dir,target_size=(150, 150),batch_size=20,class_mode='binary')
model.compile(loss='binary_crossentropy',optimizer=optimizers.RMSprop(lr=2e-5),metrics=['acc'])
start = time.time()
history=model.fit_generator(train_generator,steps_per_epoch=300,epochs=15,validation_data=validation_generator,validation_steps=50,verbose=2)
print("Time taken to train the MLP %.1f seconds."%(time.time()-start))
Please let me know if anyone is able to help! Thanks :)
So, ImageDataGenerator is not creating a batch of images that you can simply add images to. It is creating randomly transformed images on each epoch. while this may hurt your performance on the training data, it should improve performance on the validation and test data. Speaking of which, you shouldn'T run the image generator on your validation and test data because it will randomly change them every time and give your model a moving target.
This tutorial goes over the subject pretty well and should be helpful
I am trying to check the performance of my model on the validation-dataset. As such, I am using predict_generator to return predictions from my validation_generator. However, I am not able to match the predictions with true labels returned from validation_generator.classes since the order of my predictions is mixed up.
This is how I initialize my generator:
BATCH_SIZE = 64
data_generator = ImageDataGenerator(rescale=1./255,
validation_split=0.20)
train_generator = data_generator.flow_from_directory(main_path, target_size=(IMAGE_HEIGHT, IMAGE_SIZE), shuffle=False, seed=13,
class_mode='categorical', batch_size=BATCH_SIZE, subset="training")
validation_generator = data_generator.flow_from_directory(main_path, target_size=(IMAGE_HEIGHT, IMAGE_SIZE), shuffle=False, seed=13,
class_mode='categorical', batch_size=BATCH_SIZE, subset="validation")
#Found 4473 images belonging to 3 classes.
#Found 1116 images belonging to 3 classes.
Now I am using the predict_generator like so:
validation_steps_per_epoch = np.math.ceil(validation_generator.samples / validation_generator.batch_size)
predictions = model.predict_generator(validation_generator, steps=validation_steps_per_epoch)
I realize that there is a mismatch between my validation-data size (=1116) and validation_steps_per_epoch (=1152). Since these two dont match, I find the output predictions is different each time I run model.predict_generator(...).
Is there any way to fix this besides changing batch_size to 1 in order to make sure that generator steps through all samples?
I found someone with a similar issue here keras predict_generator is shuffling its output when using a keras.utils.Sequence, however his solution does not fix my problem since I am not writing any custom functions.
There is no randomization or shuffling going on, what happens is that since the batch size of the validation generator does not exactly divide the number of samples, then the leftover samples spill into the next time the generator is called, which messes up everything.
What you could do is set a batch size for the validation generator that divides exactly the number of validation samples, or set the batch size to one.
I've used ImageDataGenerator and flow_from_directory for training and validation.
These are my directories:
train_dir = Path('D:/Datasets/Trell/images/new_images/training')
test_dir = Path('D:/Datasets/Trell/images/new_images/validation')
pred_dir = Path('D:/Datasets/Trell/images/new_images/testing')
ImageGenerator Code:
img_width, img_height = 28, 28
batch_size=32
train_datagen = ImageDataGenerator(
rescale=1. / 255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
test_datagen = ImageDataGenerator(rescale=1. / 255)
train_generator = train_datagen.flow_from_directory(
train_dir,
target_size=(img_height, img_width),
batch_size=batch_size,
class_mode='categorical')
validation_generator = test_datagen.flow_from_directory(
test_dir,
target_size=(img_height, img_width),
batch_size=batch_size,
class_mode='categorical')
Found 1852 images belonging to 4 classes
Found 115 images belonging to 4 classes
This is my model training code:
history = cnn.fit_generator(
train_generator,
steps_per_epoch=1852 // batch_size,
epochs=20,
validation_data=validation_generator,
validation_steps=115 // batch_size)
Now I have some new images in a test folder (all images are inside the same folder only), on which I want to predict. But when I use .predict_generator I get:
Found 0 images belonging to 0 class
So I tried these solutions:
1) Keras: How to use predict_generator with ImageDataGenerator? This didn't work out, because its trying on validation set only.
2) How to predict the new image by using model.predict? module image not found
3) How to get predictions with predict_generator on streaming test data in Keras? This also didn't work out.
My train data is basically stored in 4 separate folders, i.e. 4 specific classes, validation also stored in same way and works out pretty well.
So in my test folder I have around 300 images, on which I want to predict and make a dataframe, like this:
image_name class
gghh.jpg 1
rrtq.png 2
1113.jpg 1
44rf.jpg 4
tyug.png 1
ssgh.jpg 3
I have also used this following code:
img = image.load_img(pred_dir, target_size=(28, 28))
img_tensor = image.img_to_array(img)
img_tensor = np.expand_dims(img_tensor, axis=0)
img_tensor /= 255.
cnn.predict(img_tensor)
But I get this error: [Errno 13] Permission denied: 'D:\\Datasets\\Trell\\images\\new_images\\testing'
But I haven't been able to predict_generator on my test images. So how can I predict on my new images using Keras. I have googled a lot, searched on Kaggle Kernels also but haven't been able to get a solution.
So first of all the test images should be placed inside a separate folder inside the test folder. So in my case I made another folder inside test folder and named it all_classes.
Then ran the following code:
test_generator = test_datagen.flow_from_directory(
directory=pred_dir,
target_size=(28, 28),
color_mode="rgb",
batch_size=32,
class_mode=None,
shuffle=False
)
The above code gives me an output:
Found 306 images belonging to 1 class
And most importantly you've to write the following code:
test_generator.reset()
else weird outputs will come.
Then using the .predict_generator() function:
pred=cnn.predict_generator(test_generator,verbose=1,steps=306/batch_size)
Running the above code will give output in probabilities so at first I need to convert them to class number. In my case it was 4 classes, so class numbers were 0,1,2 and 3.
Code written:
predicted_class_indices=np.argmax(pred,axis=1)
Next step is I want the name of the classes:
labels = (train_generator.class_indices)
labels = dict((v,k) for k,v in labels.items())
predictions = [labels[k] for k in predicted_class_indices]
Where by class numbers will be replaced by the class names. One final step if you want to save it to a csv file, arrange it in a dataframe with the image names appended with the class predicted.
filenames=test_generator.filenames
results=pd.DataFrame({"Filename":filenames,
"Predictions":predictions})
Display your dataframe. Everything is done now. You get all the predicted class for your images.
I had some trouble with predict_generator(). Some posts here helped a lot. I post my solution here as well and hope it will help others. What I do:
Make predictions on new images using predict_generator()
Get filename for each prediction
Store results in a data frame
I make binary predictions à la "cats and dogs" as documented here. However, the logic can be generalised to multiclass cases. In this case the outcome of the prediction has one column per class.
First, I load my stored model and set up the data generator:
import numpy as np
import pandas as pd
from keras.preprocessing.image import ImageDataGenerator
from keras.models import load_model
# Load model
model = load_model('my_model_01.hdf5')
test_datagen = ImageDataGenerator(rescale=1./255)
test_generator = test_datagen.flow_from_directory(
"C:/kerasimages/pred/",
target_size=(150, 150),
batch_size=20,
class_mode='binary',
shuffle=False)
Note: it is important to specify shuffle=False in order to preserve the order of filenames and predictions.
Images are stored in C:/kerasimages/pred/images/. The data generator will only look for images in subfolders of C:/kerasimages/pred/ (as specified in test_generator). It is important to respect the logic of the data generator, so the subfolder /images/ is required. Each subfolder in C:/kerasimages/pred/ is interpreted as one class by the generator. Here, the generator will report Found x images belonging to 1 classes (since there is only one subfolder). If we make predictions, classes (as detected by the generator) are not relevant.
Now, I can make predictions using the generator:
# Predict from generator (returns probabilities)
pred=model.predict_generator(test_generator, steps=len(test_generator), verbose=1)
Resetting the generator is not required in this case, but if a generator has been set up before, it may be necessary to rest it using test_generator.reset().
Next I round probabilities to get classes and I retrieve filenames:
# Get classes by np.round
cl = np.round(pred)
# Get filenames (set shuffle=false in generator is important)
filenames=test_generator.filenames
Finally, results can be stored in a data frame:
# Data frame
results=pd.DataFrame({"file":filenames,"pr":pred[:,0], "class":cl[:,0]})
I strongly recommend you to make a parent folder in the test folder. Then move the test folder to the parent folder.
means if you have test folder in this manner:
/root/test/img1.png
/root/test/img2.png
/root/test/img3.png
/root/test/img4.png
this wrong way to use predict_generator. Update your test folder like this:
/root/test_parent/test/img1.png
/root/test_parent/test/img2.png
/root/test_parent/test/img3.png
/root/test_parent/test/img4.png
Use this command to update:
mv /root/test/ ./root/test_parent/test
And, also don't forget to give a path to the model like this
"/root/test_parent/"
This method is work for me.
The most probably you are making a mistake using flow_from_directory. Reading the docs:
flow_from_directory(directory, ...)
Where:
directory: Path to the target directory. It should contain one
subdirectory per class. Any PNG, JPG, BMP, PPM or TIF images inside
each of the subdirectories directory tree will be included in the
generator.
That means that inside the directory that you are passing to this function, you have to create subdirectories and place your images inside this subdirectories. Otherwise, when the images are in the directory that you are passing (not subdirectories), indeed there are 0 images and 0 classes.
EDIT
Okay so in case of the prediction you want to perform I believe that you want to use the predict function as follows: (note that you have to provide data to the network just in the same format as you did during learning process)
image = img_to_array(load_img(f"{directory}/{foldername}/{filename}"))
# here you prepare the input data, for example here we take the gray image
# gray scale is the 1st channel in the Lab color space
color_me = rgb2lab((1.0 / 255) * color_me)[:, :, 0]
color_me = color_me.reshape(color_me.shape + (1,))
# here data is in the format which is accepted by, in this case, my model
# for your model you have to do the preparation just the same as in the case of learning process
output = model.predict(np.array([color_me]))
# and here you have your predicted output
As per Keras documenation cited below, predict_generator is deprecated. Model.predict now supports generators, so there is no longer any need to use the predict_generator endpoint.
Keras documentation, Refernce: https://www.tensorflow.org/api_docs/python/tf/keras/Model#predict_generator
I'm using the ImageDataGenerator inside Keras to read a directory of images. I'd like to save the result inside a numpy array, so I can do further manipulations and save it to disk in one file.
flow_from_directory() returns an iterator, which is why I tried the following
itr = gen.flow_from_directory('data/train/', batch_size=1, target_size=(32,32))
imgs = np.concatenate([itr.next() for i in range(itr.nb_sample)])
but that produced
ValueError: could not broadcast input array from shape (32,32,3) into shape (1)
I think I'm misusing the concatenate() function, but I can't figure out where I fail.
I had the same problem and solved it the following way:
itr.next returns the next batch of images as two numpy.ndarray objects: batch_x, batch_y. (Source: keras/preprocessing/image.py)
So what you can do is set the batch_size for flow_from_directory to the size of your whole train dataset.
Example, my whole training set consists of 1481 images:
train_datagen = ImageDataGenerator(rescale=1. / 255)
itr = train_datagen.flow_from_directory(
train_data_dir,
target_size=(img_width, img_height),
batch_size=1481,
class_mode='categorical')
X, y = itr.next()
While using ImageDataGenerator, the data is loaded in the format of the directoryiterator.
you can extract it as batches or as a whole
train_generator = train_datagen.flow_from_directory(
train_parent_dir,
target_size=(300, 300),
batch_size=32,
class_mode='categorical'
)
the output of which is
Found 3875 images belonging to 3 classes.
to extract as numpy array as a whole(which means not as a batch), this code can be used
x=np.concatenate([train_generator.next()[0] for i in range(train_generator.__len__())])
y=np.concatenate([train_generator.next()[1] for i in range(train_generator.__len__())])
print(x.shape)
print(y.shape)
NOTE:BEFORE THIS CODE IT IS ADVISED TO USE train_generator.reset()
the output of above code is
(3875, 300, 300, 3)
(3875, 3)
The output is obtained as a numpy array together, even though it was loaded as batches of 32 using ImageDataGenerator.
To get the output as batches use the following code
x=[]
y=[]
train_generator.reset()
for i in range(train_generator.__len__()):
a,b=train_generator.next()
x.append(a)
y.append(b)
x=np.array(x)
y=np.array(y)
print(x.shape)
print(y.shape)
the output of the code is
(122,)
(122,)
Hope this works as a solution