How to split dataset to images and labels using tensorflow

How to split dataset to images and labels using tensorflow - python

I imported image dataset using tensorflow.keras.preprocessing.image_dataset_from_directory():
train = tf.keras.preprocessing.image_dataset_from_directory(
'chest_xray/train/',
labels = 'inferred',
label_mode = 'categorical',
class_names = ['NORMAL', 'PNEUMONIA'],
color_mode = 'grayscale',
batch_size = batch_size,
image_size = (image_height, image_width),
shuffle = True
)
I tried to split the dataset into images and labels, however I was unable to do so. Is there a method to split train to train_images and train_labels?

You will be creating two directories, i.e. in your case two folders with one named NORMAL and another named PNEUMONIA. Your directory structure should be something like this
main_directory/
...class_a/
......a_image_1.jpg
......a_image_2.jpg
...class_b/
......b_image_1.jpg
......b_image_2.jpg
Your main directory will be having a folder with class names and their respective images inside. So here you won't be splitting you will be directly sending off the data packed under its label itself. You can refer to the documentation here
If you are looking for the code to split training and testing data, Let me know.

Related

How to add data augmentation to regression problem?

I am trying to build a CNN model for regression problem with limited number of input data with 400 sample size. The inputs are images and labels are extracted from a column of csv file. To increase the input data, I need to augment the input images and match them with existing labels. I am using rotation and flipping augmentation methods. I am not sure how existing labels should be linked to the augmented images and how the final tensorflow dataset should be created to fit the model. Can anyone help me to solve this data augmentation?
#load csv file
labelPath = "/content/drive/MyDrive/Notebook/tepm.csv"
cols = ["temperature"]
df = pd.read_csv(labelPath, sep=" ", header=None, names=cols)
inputPath='/content/drive/MyDrive/Notebook/test_png_64'
images = []
# Load in the images
for filepath in os.listdir(inputPath):
images.append(cv2.imread(inputPath+'/{0}'.format(filepath),flags=(cv2.IMREAD_ANYCOLOR | cv2.IMREAD_ANYDEPTH)))
images_scaled = np.array(images, dtype="float") / 255.0
(trainY, testY, trainX, testX) = train_test_split(df, images_scaled, test_size=0.20, random_state=42)
(trainY, valY, trainX, valX) = train_test_split(trainY, trainX, test_size=0.20, random_state=42)
def rotate(trainX: tf.Tensor) -> tf.Tensor:
# Rotate 90 degrees
return tf.image.rot90(trainX, tf.random_uniform(shape=[], minval=0, maxval=4, dtype=tf.float32))
def flip(trainX: tf.Tensor) -> tf.Tensor:
trainX = tf.image.random_flip_left_right(trainX)
trainX = tf.image.random_flip_up_down(trainX)
return trainX
update with ImageDataGenerator
datagen = ImageDataGenerator(
vertical_flip=True,
horizontal_flip=True,
fill_mode="nearest")
datagen.fit(trainX)
model.compile(optimizer=tf.keras.optimizers.Adam(lr=0.0001), loss='mean_squared_error', metrics='mse')

ImageDataGenerator should do the trick. It generate batches of tensor image data with real-time data augmentation.

Initially you should consider whether augmentations that preserve the labels are useful, or augmentations that require matching label augmentation, or both. If I am following your code correctly, you have temperature for scalar labels. Without knowing the nature of your images, I'd guess it unlikely that rotations and flips would be temperature-dependent, thus the labels are preserved and you are all set to go with ImageDataGenerator as is. Whether or not those augmentations will help the training is hard to know without trying it. Conversely, ImageDataGenerator does have a Brightness augmentation, which is the sort of thing I could imagine being temperature dependent in an image. In that case, the labels aren't preserved and you'd have to augment them manually, because I don't think ImageDataGenerator has methods for scalar labels. In my experience, it is the latter sort of augmentations (labels not-preserved) which are more obviously useful. But to get matching label augmentation you may have to do a little more manual coding than what comes stock with ImageDataGenerator; fortunately it might not be too hard.
Some of the basic elements for matching label augmentation might go like this (this is not complete code, just snippets):
Set up the subset of parameters for ImageDataGenerator augmentation that make sense for your scalar labels in a convenience dict:
regression_aug = dict(fill_mode='nearest',
rotation_range=3,
width_shift_range=0.1,
height_shift_range=0.1,
Use the ImageDataGenerator method get_random_transform:
self.tparams[i] = self.generator.get_random_transform(self.img_dims)
Apply it to the training image, and further manually apply it to the scalar label(s):
batch_X[i] = self.generator.apply_transform(img[i], self.tparams[i]))
batch_y[i,0] = self.lbl[x,0] - self.tparams[i]['tx']
batch_y[i,1] = self.lbl[x,1] - self.tparams[i]['ty']
batch_y[i,2] = self.lbl[x,2] - self.tparams[i]['theta']
where in this example case I had scalar labels that consisted of position and orientation, such that they could be sensibly be translated and rotated during augmentation.

Classify multiple images in one go with InceptionV3

I'm trying to classify several images at once in one directory using InceptionV3. I was able to extract bottleneck features using the flow from directory method and get the results of the predictions as an array.
My question is how do I know which image each prediction refers to, since they are out of order in the array. I tried to extract classes using generator_top.classes, but it takes them from the name of the directory where the images are.
I see that the predictions are all correct (those objects that are on the images, those are obtained in the original array).
I would only understand how to compare them.
In addition, I used this approach when testing images on a large test sample (there were folders with each class and images in them) and all the predictions in the array were displayed in order of folders in test dir
but when I try to do this from the same directory with different classes, I cannot compare the predictions with the images.
from keras import applications
base_model = applications.InceptionV3(include_top=False, weights='imagenet')
res_model = Sequential()
res_model.add(GlobalAveragePooling2D(input_shape=train_data.shape[1:]))
res_model.add(Dense(17, activation='softmax'))
datagen=ImageDataGenerator(rescale=1./255)
generator = datagen.flow_from_directory(
test_path,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode=None,
shuffle=False)
nb_test_samples = len(generator.filenames) #17
predict_size_test = int(math.ceil(nb_test_samples/batch_size)) #2
bottleneck_features_test = base_model.predict_generator( generator,
predict_size_test, verbose=1)
np.save('bottleneck_features_test_10000inc.npy', bottleneck_features_test)
generator = datagen.flow_from_directory(
test_path,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode=None,
shuffle=False)
nb_test_samples = len(generator_top.filenames)
test_data = np.load('bottleneck_features_test_10000inc.npy')
test_labels = generator_top.classes
test_labels = to_categorical(test_labels, num_classes=17)
tr_predictions=[np.argmax(res_model.predict(np.expand_dims(feature,axis=0)))for feature in test_data]

You can get the filenames of the files generated, and map it with the predicted results.
datagen = ImageDataGenerator()
gen = datagen.flow_from_directory(...)
for i in gen:
idx = (gen.batch_index - 1) * gen.batch_size
filenames = gen.filenames[idx : idx + gen.batch_size]
The filenames array will have the filenames, use functions like zip in python, to map the filenames and the predicted results.
Note 1 - This will work when
shuffle=False
in the generator, because the data may be shuffled after reading and you wont be able to track the shuffle.
Note 2 - this answer is inspired from
Keras flowFromDirectory get file names as they are being generated.

How to perform prediction using predict_generator on unlabeled test data in Keras?

I'm trying to build an image classification model. It's a 4 class image classification. Here is my code for building image generators and running the training:
train_datagen = ImageDataGenerator(rescale=1./255.,
rotation_range=30,
horizontal_flip=True,
validation_split=0.1)
train_generator = image_gen.flow_from_directory(train_dir, target_size=(299, 299),
class_mode='categorical', batch_size=20,
subset='training')
validation_generator = image_gen.flow_from_directory(train_dir, target_size=(299, 299),
class_mode='categorical', batch_size=20,
subset='validation')
model.compile(Adam(learning_rate=0.001), loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit_generator(train_generator, steps_per_epoch=int(440/20), epochs=20,
validation_data=validation_generator,
validation_steps=int(42/20))
I was able to get train and validation work perfectly because the images in train directory are stored in a separate folder for each class. But, as you can see below, the test directory has 100 images and no folders inside it. It also doesn't have any labels and only contains image files.
How can I do prediction on the image files in test folder using Keras?

If you are interested to only perform prediction, you can load the images by a simple hack like this:
test_datagen = ImageDataGenerator(rescale=1/255.)
test_generator = test_datagen('PATH_TO_DATASET_DIR/Dataset',
# only read images from `test` directory
classes=['test'],
# don't generate labels
class_mode=None,
# don't shuffle
shuffle=False,
# use same size as in training
target_size=(299, 299))
preds = model.predict_generator(test_generator)
You can access test_generator.filenames to get a list of corresponding filenames so that you can map them to their corresponding prediction.
Update (as requested in comments section): if you want to map predicted classes to filenames, first you must find the predicted classes. If your model is a classification model, then probably it has a softmax layer as the classifier. So the values in preds would be probabilities. Use np.argmax method to find the index with highest probability:
preds_cls_idx = preds.argmax(axis=-1)
So this gives you the indices of predicted classes. Now we need to map indices to their string labels (i.e. "car", "bike", etc.) which are provided by training generator in class_indices attribute:
import numpy as np
idx_to_cls = {v: k for k, v in train_generator.class_indices.items()}
preds_cls = np.vectorize(idx_to_cls.get)(preds_cls_idx)
filenames_to_cls = list(zip(test_generator.filenames, preds_cls))

your folder structure be like testfolder/folderofallclassfiles
you can use
test_generator = test_datagen.flow_from_directory(
directory=pred_dir,
class_mode=None,
shuffle=False
)
before prediction i would also use reset to avoid unwanted outputs
EDIT:
For your purpose you need to know which image is associated with which prediction. The problem is that the data-generator start at different positions in the dataset each time we use the generator, thus giving us different outputs everytime. So, in order to restart at the beginning of the dataset in each call to predict_generator() you would need to exactly match the number of iterations and batches to the dataset-size.
There are multiple ways to encounter this
a) You can see the internal batch-counter using batch_index of generator
b) create a new data-generator before each call to predict_generator()
c) there is a better and simpler way, which is to call reset() on the generator, and if you have set shuffle=False in flow_from_directory then it should start over from the beginning of the dataset and give the exact same output each time, so now the ordering of testgen.filenames and testgen.classes matches
test_generator.reset()
Prediction
prediction = model.predict_generator(test_generator,verbose=1,steps=numberofimages/batch_size)
To map the filename with prediction
predict_generator gives output in probabilities so at first we need to convert them to class number like 0,1..
predicted_class = np.argmax(prediction,axis=1)
next step would be to convert those class number into actual class names
l = dict((v,k) for k,v in training_set.class_indices.items())
prednames = [l[k] for k in predicted_classes]
getting filenames
filenames = test_generator.filenames
Finally creating df
finaldf = pd.DataFrame({'Filename': filenames,'Prediction': prednames})

How to use .predict_generator() on new Images - Keras

I've used ImageDataGenerator and flow_from_directory for training and validation.
These are my directories:
train_dir = Path('D:/Datasets/Trell/images/new_images/training')
test_dir = Path('D:/Datasets/Trell/images/new_images/validation')
pred_dir = Path('D:/Datasets/Trell/images/new_images/testing')
ImageGenerator Code:
img_width, img_height = 28, 28
batch_size=32
train_datagen = ImageDataGenerator(
rescale=1. / 255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
test_datagen = ImageDataGenerator(rescale=1. / 255)
train_generator = train_datagen.flow_from_directory(
train_dir,
target_size=(img_height, img_width),
batch_size=batch_size,
class_mode='categorical')
validation_generator = test_datagen.flow_from_directory(
test_dir,
target_size=(img_height, img_width),
batch_size=batch_size,
class_mode='categorical')
Found 1852 images belonging to 4 classes
Found 115 images belonging to 4 classes
This is my model training code:
history = cnn.fit_generator(
train_generator,
steps_per_epoch=1852 // batch_size,
epochs=20,
validation_data=validation_generator,
validation_steps=115 // batch_size)
Now I have some new images in a test folder (all images are inside the same folder only), on which I want to predict. But when I use .predict_generator I get:
Found 0 images belonging to 0 class
So I tried these solutions:
1) Keras: How to use predict_generator with ImageDataGenerator? This didn't work out, because its trying on validation set only.
2) How to predict the new image by using model.predict? module image not found
3) How to get predictions with predict_generator on streaming test data in Keras? This also didn't work out.
My train data is basically stored in 4 separate folders, i.e. 4 specific classes, validation also stored in same way and works out pretty well.
So in my test folder I have around 300 images, on which I want to predict and make a dataframe, like this:
image_name class
gghh.jpg 1
rrtq.png 2
1113.jpg 1
44rf.jpg 4
tyug.png 1
ssgh.jpg 3
I have also used this following code:
img = image.load_img(pred_dir, target_size=(28, 28))
img_tensor = image.img_to_array(img)
img_tensor = np.expand_dims(img_tensor, axis=0)
img_tensor /= 255.
cnn.predict(img_tensor)
But I get this error: [Errno 13] Permission denied: 'D:\\Datasets\\Trell\\images\\new_images\\testing'
But I haven't been able to predict_generator on my test images. So how can I predict on my new images using Keras. I have googled a lot, searched on Kaggle Kernels also but haven't been able to get a solution.

So first of all the test images should be placed inside a separate folder inside the test folder. So in my case I made another folder inside test folder and named it all_classes.
Then ran the following code:
test_generator = test_datagen.flow_from_directory(
directory=pred_dir,
target_size=(28, 28),
color_mode="rgb",
batch_size=32,
class_mode=None,
shuffle=False
)
The above code gives me an output:
Found 306 images belonging to 1 class
And most importantly you've to write the following code:
test_generator.reset()
else weird outputs will come.
Then using the .predict_generator() function:
pred=cnn.predict_generator(test_generator,verbose=1,steps=306/batch_size)
Running the above code will give output in probabilities so at first I need to convert them to class number. In my case it was 4 classes, so class numbers were 0,1,2 and 3.
Code written:
predicted_class_indices=np.argmax(pred,axis=1)
Next step is I want the name of the classes:
labels = (train_generator.class_indices)
labels = dict((v,k) for k,v in labels.items())
predictions = [labels[k] for k in predicted_class_indices]
Where by class numbers will be replaced by the class names. One final step if you want to save it to a csv file, arrange it in a dataframe with the image names appended with the class predicted.
filenames=test_generator.filenames
results=pd.DataFrame({"Filename":filenames,
"Predictions":predictions})
Display your dataframe. Everything is done now. You get all the predicted class for your images.

I had some trouble with predict_generator(). Some posts here helped a lot. I post my solution here as well and hope it will help others. What I do:
Make predictions on new images using predict_generator()
Get filename for each prediction
Store results in a data frame
I make binary predictions à la "cats and dogs" as documented here. However, the logic can be generalised to multiclass cases. In this case the outcome of the prediction has one column per class.
First, I load my stored model and set up the data generator:
import numpy as np
import pandas as pd
from keras.preprocessing.image import ImageDataGenerator
from keras.models import load_model
# Load model
model = load_model('my_model_01.hdf5')
test_datagen = ImageDataGenerator(rescale=1./255)
test_generator = test_datagen.flow_from_directory(
"C:/kerasimages/pred/",
target_size=(150, 150),
batch_size=20,
class_mode='binary',
shuffle=False)
Note: it is important to specify shuffle=False in order to preserve the order of filenames and predictions.
Images are stored in C:/kerasimages/pred/images/. The data generator will only look for images in subfolders of C:/kerasimages/pred/ (as specified in test_generator). It is important to respect the logic of the data generator, so the subfolder /images/ is required. Each subfolder in C:/kerasimages/pred/ is interpreted as one class by the generator. Here, the generator will report Found x images belonging to 1 classes (since there is only one subfolder). If we make predictions, classes (as detected by the generator) are not relevant.
Now, I can make predictions using the generator:
# Predict from generator (returns probabilities)
pred=model.predict_generator(test_generator, steps=len(test_generator), verbose=1)
Resetting the generator is not required in this case, but if a generator has been set up before, it may be necessary to rest it using test_generator.reset().
Next I round probabilities to get classes and I retrieve filenames:
# Get classes by np.round
cl = np.round(pred)
# Get filenames (set shuffle=false in generator is important)
filenames=test_generator.filenames
Finally, results can be stored in a data frame:
# Data frame
results=pd.DataFrame({"file":filenames,"pr":pred[:,0], "class":cl[:,0]})

I strongly recommend you to make a parent folder in the test folder. Then move the test folder to the parent folder.
means if you have test folder in this manner:
/root/test/img1.png
/root/test/img2.png
/root/test/img3.png
/root/test/img4.png
this wrong way to use predict_generator. Update your test folder like this:
/root/test_parent/test/img1.png
/root/test_parent/test/img2.png
/root/test_parent/test/img3.png
/root/test_parent/test/img4.png
Use this command to update:
mv /root/test/ ./root/test_parent/test
And, also don't forget to give a path to the model like this
"/root/test_parent/"
This method is work for me.

The most probably you are making a mistake using flow_from_directory. Reading the docs:
flow_from_directory(directory, ...)
Where:
directory: Path to the target directory. It should contain one
subdirectory per class. Any PNG, JPG, BMP, PPM or TIF images inside
each of the subdirectories directory tree will be included in the
generator.
That means that inside the directory that you are passing to this function, you have to create subdirectories and place your images inside this subdirectories. Otherwise, when the images are in the directory that you are passing (not subdirectories), indeed there are 0 images and 0 classes.
EDIT
Okay so in case of the prediction you want to perform I believe that you want to use the predict function as follows: (note that you have to provide data to the network just in the same format as you did during learning process)
image = img_to_array(load_img(f"{directory}/{foldername}/{filename}"))
# here you prepare the input data, for example here we take the gray image
# gray scale is the 1st channel in the Lab color space
color_me = rgb2lab((1.0 / 255) * color_me)[:, :, 0]
color_me = color_me.reshape(color_me.shape + (1,))
# here data is in the format which is accepted by, in this case, my model
# for your model you have to do the preparation just the same as in the case of learning process
output = model.predict(np.array([color_me]))
# and here you have your predicted output

As per Keras documenation cited below, predict_generator is deprecated. Model.predict now supports generators, so there is no longer any need to use the predict_generator endpoint.
Keras documentation, Refernce: https://www.tensorflow.org/api_docs/python/tf/keras/Model#predict_generator

Reading data into tensorflow and creating Dataset with TF-slim

I need to read in many 'images' from .txt files and want to generate a tensorflow dataset with them. Currently, I read in every single matrix with numpy.loadtxt and create an array of shape [N_matrices, height, width, N_channels], and a similar array with the label for every matrix.
I create a tensorflow dataset from these two arrays by using
inputs = tf.convert_to_tensor(x_train, dtype=tf.float32)
labels = tf.convert_to_tensor(y_train, dtype=tf.float32)
dataset = tf.data.Dataset.from_tensor_slices( {"image": inputs,"label": labels})
I now want to make use of the following function to create batches from this dataset (as done here):
def load_batch(dataset, batch_size=BATCH_SIZE, height=LENGTH_INPUT, width=LENGTH_INPUT):
data_provider = slim.dataset_data_provider.DatasetDataProvider(dataset)
image, label = data_provider.get(['image', 'label'])
images, labels = tf.train.batch(
[image, label],
batch_size=batch_size,
allow_smaller_final_batch=True)
return images, labels
However, this gives me the following error:
data_provider = slim.dataset_data_provider.DatasetDataProvider(dataset)
File "/home/.local/lib/python3.5/site-packages/tensorflow/contrib/slim/python/slim/data/dataset_data_provider.py", line 85, in init
dataset.data_sources,
AttributeError: 'TensorSliceDataset' object has no attribute 'data_sources'
Why am I getting this error, and how can I fix it? I also suppose there are much better ways for handling input from txt files to tensorflow (or tensorflow-slim) but I've found very little information on this. How could I generate my Datasets in a better way?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to split dataset to images and labels using tensorflow - python

Related

How to add data augmentation to regression problem?

Classify multiple images in one go with InceptionV3

How to perform prediction using predict_generator on unlabeled test data in Keras?

How to use .predict_generator() on new Images - Keras

Reading data into tensorflow and creating Dataset with TF-slim

Categories

Resources