Loading Custom Dataset via Keras

Loading Custom Dataset via Keras - python

I’ve got a simple GAN model (Keras-based) that I use for handwritten digit image generation based on the MNIST dataset. I want to create a similar dataset for Keras using raw image data from the Sokoto Coventry Fingerprint Dataset (SOCOFing), which consists of 6000 different black and white fingerprint image samples, and apply it to the same GAN model. The problem is — I’m stuck with creating and loading/processing the custom dataset.
This is the code from the model that I use for MNIST:
import os
import numpy as np
import matplotlib.pyplot as plt
from tqdm import tqdm
from keras.layers import Input
from keras.models import Model, Sequential
from keras.layers.core import Dense, Dropout
from keras.layers.advanced_activations import LeakyReLU
from keras.datasets import mnist
from tensorflow.keras.optimizers import Adam
from keras import initializers
os.environ["KERAS_BACKEND"] = "tensorflow"
np.random.seed(10)
random_dim = 100
def load_mnist_data():
    (x_train, y_train), (x_test, y_test) = mnist.load_data()
    x_train = (x_train.astype(np.float32) - 127.5)/127.5
    x_train = x_train.reshape(60000, 784)
    return (x_train, y_train, x_test, y_test)
For experimentation purposes, I’ve created a smaller version of the SOCOFing dataset that contains 500 samples only. The code for the dataset generator is as follows:
from PIL import Image
import os
import numpy as np
path_to_files = "./fingerprints/"
vectorized_images_X = []
vectorized_images_Y = []
for _, file in enumerate(os.listdir(path_to_files)):
image = Image.open(path_to_files + file)
image_array = np.array(image)
vectorized_images_X.append(image_array)
vectorized_images_Y.append(image_array)
np.savez("./fingerprints.npz",DataX=vectorized_images_X,DataY=vectorized_images_Y)
import numpy as np
path = "./fingerprints.npz"
with np.load(path) as data:
train_data = data['DataX']
print(train_data)
test_data = data['DataY']
print(test_data)
So now I’ve got a *.npz file but don’t know how to inject it into the model. Please advise.

Here is Sample code to insert any .npz file, You can refer this code.
DATA_URL = 'https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz'
path = tf.keras.utils.get_file('mnist.npz', DATA_URL)
with np.load(path) as data:
train_examples = data['x_train']
train_labels = data['y_train']
test_examples = data['x_test']
test_labels = data['y_test']
For more details,you can follow this link.

Related

Read image labels from a csv file

I have a dataset of medical images (.dcm) which I can read into TensorFlow as a batch. However, the problem that I am facing is that the labels of these images are in a .csv. The .csv file contains two columns - image_path (location of the image) and image_labels (0 for no; 1 for yes). I wanted to know how I can read the labels into a TensorFlow dataset batch wise. I am using the following code to load the images batch wise:-
import tensorflow as tf
import tensorflow_io as tfio
def process_image(filename):
image_bytes = tf.io.read_file(filename)
image = tf.squeeze(
tfio.image.decode_dicom_image(image_bytes, on_error='strict', dtype=tf.uint16),
axis = 0
)
x = tfio.image.decode_dicom_data(image_bytes, tfio.image.dicom_tags.PhotometricInterpretation)
image = (image - tf.reduce_min(image))/(tf.reduce_max(image) - tf.reduce_min(image))
if(x == "MONOCHROME1"):
image = 1 - image
image = image*255
image = tf.cast(tf.image.resize(image, (512, 512)),tf.uint8)
return image
# train_images is a list containing the locations of .dcm images
dataset = tf.data.Dataset.from_tensor_slices(train_images)
dataset = dataset.map(process_image, num_parallel_calls=4).batch(50)
Hence, I can load the images into the TensorFlow dataset. But I would like to know how I can load the image labels batch wise.

Something like this instead of the last two lines should work:
#train_labels is a list of labels for each image in the same order as in train_images
dataset = tf.data.Dataset.from_tensor_slices((train_images, train_labels))
dataset = dataset.map(lambda x,y : (process_image(x), y), num_parallel_calls=4).batch(50)
now the dataset can be passed to your network's .fit(), .predict() and other methods:
model.fit(dataset, epochs=epochs, callbacks=callbacks)
Alternatively, you can create a second dataset containing the labels and then combine two datasets with tf.data.Dataset.zip(). It works similarly to the python's native zip.
I prefer the first method since It feels a bit cleaner to me + I can, for example, shuffle the filenames/labels and only then parse the files instead of doing the opposite.

How to convert image from folder into tensors using torch?

I'm trying to convert images in a folder to tensors, save it and load them later, as shown below
transform = transforms.Compose([
transforms.ToTensor()])
dataset = datasets.ImageFolder(
r'imagedata', transform=transform)
torch.save(dataset, 'train_data.pt')
But I get a value error when trying to load the trained file as below:
train_codes = torch.Tensor(torch.load(os.path.join(self.data_dir, "train_data.pt")))
ValueError: only one element tensors can be converted to Python scalars
Any help or suggestion to fix this will be highly appreciated.

You met this problem because train_data.pt was not saved as a Tensor, since that variable was read the data by ImageFolder which was inherited from DatasetFolder, it should be loaded and used as a Torch Dataset. The example below use DataLoader as documents:
import torch
from torchvision import transforms, datasets
# Saving part
transform = transforms.Compose([
transforms.ToTensor()
])
dataset = datasets.ImageFolder(r'imagedata', transform=transform)
torch.save(dataset,'train_data.pt')
# Loading part
data = torch.load(torch.load(os.path.join(self.data_dir, "train_data.pt")))
loader = torch.utils.data.DataLoader(data, batch_size = 32)
for image, label in loader:
# Processing....

How can I use a function or loop on this resnet50 code to predict the components of multiple images (within a folder), instead of just one?

How can I do this for multiple images (within a folder) and put them into a Dataframe?
This is the code for analysing one image:
import numpy as np
from keras.preprocessing import image
from keras.applications import resnet50
import warnings
warnings.filterwarnings('ignore')
# Load Keras' ResNet50 model that was pre-trained against the ImageNet database
model = resnet50.ResNet50()
# Load the image file, resizing it to 224x224 pixels (required by this model)
img = image.load_img("rgotunechair10.jpg", target_size=(224, 224))
# Convert the image to a numpy array
x = image.img_to_array(img)
# Add a forth dimension since Keras expects a list of images
x = np.expand_dims(x, axis=0)
# Scale the input image to the range used in the trained network
x = resnet50.preprocess_input(x)
# Run the image through the deep neural network to make a prediction
predictions = model.predict(x)
# Look up the names of the predicted classes. Index zero is the results for the first image.
predicted_classes = resnet50.decode_predictions(predictions, top=9)
image_components = []
for x,y,z in predicted_classes[0]:
image_components.append(y)
print(image_components)
This is the output:
['desktop_computer', 'desk', 'monitor', 'space_bar', 'computer_keyboard', 'typewriter_keyboard', 'screen', 'notebook', 'television']
How can I do this for multiple images (within a folder) and put them into a Dataframe?

First of all, move the code for analyzing the image to a function. Instead of printing the result, you will return it there:
import numpy as np
from keras.preprocessing import image
from keras.applications import resnet50
import warnings
warnings.filterwarnings('ignore')
def run_resnet50(image_name):
# Load Keras' ResNet50 model that was pre-trained against the ImageNet database
model = resnet50.ResNet50()
# Load the image file, resizing it to 224x224 pixels (required by this model)
img = image.load_img(image_name, target_size=(224, 224))
# Convert the image to a numpy array
x = image.img_to_array(img)
# Add a forth dimension since Keras expects a list of images
x = np.expand_dims(x, axis=0)
# Scale the input image to the range used in the trained network
x = resnet50.preprocess_input(x)
# Run the image through the deep neural network to make a prediction
predictions = model.predict(x)
# Look up the names of the predicted classes. Index zero is the results for the first image.
predicted_classes = resnet50.decode_predictions(predictions, top=9)
image_components = []
for x,y,z in predicted_classes[0]:
image_components.append(y)
return(image_components)
Then, get all images inside the desired folder (for instance, the current directory):
images_path = '.'
images = [f for f in os.listdir(images_path) if f.endswith('.jpg')]
Run the function on all images, get the result:
result = [run_resnet50(img_name) for img_name in images]
This result will be a list of lists. Then you could just move it to a DataFrame. If you want to keep the image name for each result, use a dictionary instead.

how to include files with tf.data.Dataset

I am training Face-recognition model, So for Triplet Loss, I have to generate the batch such that it contains fixed amount of images from each label. For eg. I am saying that take 8 images from 3 random labels each time it generates batch for training, As suggested in this Github Issue.
In my dataset folder I have subfolder which is renamed as a label and contains the images of that folder.
In the given issue, solution is presented,
import numpy as np
import cv2
num_labels = len(path_list)
num_classes_per_batch = 3
num_images_per_class = 8
image_dirs = ["/content/drive/My Drive/smalld_processed/train/{:d}".format(i) for i in
range(num_labels)]
## Create the list of datasets creating filenames
#datasets = [tf.data.Dataset.list_files(f"{image_dir}/*.jpg" for image_dir in image_dirs)]
datasets = [tf.data.Dataset.list_files(f"{image_dir}/*.jpg") for image_dir in image_dirs]
adk = ["{}/*.jpg".format(image_dir) for image_dir in image_dirs]
print(adk)
def generator():
while True:
# Sample the labels that will compose the batch
labels = np.random.choice(range(num_labels),
num_classes_per_batch,
replace=False)
for label in labels:
for _ in range(num_images_per_class):
yield label
choice_dataset = tf.data.Dataset.from_generator(generator, tf.int64)
dataset = tf.data.experimental.choose_from_datasets(datasets, choice_dataset)
## Now you read the image content
def load_image(filename):
image = cv2.imread(filename,1)
image = dataset.map(image, num_parallel_calls=tf.data.experimental.AUTOTUNE)
image = image[...,::-1]
label = int(os.path.split(os.path.dirname(filename))[1])
image=dataset1.append()
label=dataset2.append
return image, label
dataset = dataset.map(load_image, num_parallel_calls=tf.data.experimental.AUTOTUNE)
batch_size = num_classes_per_batch * num_images_per_class
dataset = dataset.batch(batch_size)
dataset = dataset.prefetch(None)
With this I am not able to load the images and it's showing me this error.
SystemError: <built-in function imread> returned NULL without setting an error
Could you help me to fix the error or any other suggestion on how to load images.
Thanks in advance!!

I think that in this case your cv2.imread is acting up. I would first build a simple program that does not do the reading "on the fly", but instead pre-loads images to train on a small dataset.
It also feels like you are misusing the dataset.map function. I would recommend this tutorial on the tf.data.Dataset function: http://tensorexamples.com/2020/07/27/Using-the-tf.data.Dataset.html, and maybe this one on augmentation so you can see how you should use the map function properly: http://tensorexamples.com/2020/07/28/Augmentation.html.
Good luck!

Convergence K-Means Unsupervised Image Clustering Pre-trained Keras Grayscale Image

I'm new to image clustering, and I followed this tutorial:
Which results in the following code:
from sklearn.cluster import KMeans
from keras.preprocessing import image
from keras.applications.vgg16 import VGG16
from keras.applications.vgg16 import preprocess_input
import numpy as np
model = VGG16(weights='imagenet', include_top=False)
directory = './imageSample'
vgg16_feature_list = []
for filename in os.listdir(directory):
if(filename != '.DS_Store'):
img_path = directory + '/' + filename
print(img_path)
img = image.load_img(img_path)#color_mode = "grayscale")
img_data = image.img_to_array(img)
img_data = np.expand_dims(img_data, axis=0)
img_data = preprocess_input(img_data)
vgg16_feature = model.predict(img_data)
vgg16_feature_np = np.array(vgg16_feature)
vgg16_feature_list.append(vgg16_feature_np.flatten())
vgg16_feature_list_np = np.array(vgg16_feature_list)
kmeans = KMeans(n_clusters=5, random_state=0).fit(vgg16_feature_list_np)
However, I'm receiving this error:
ConvergenceWarning: Number of distinct clusters (1) found smaller than n_clusters (5). Possibly due to duplicate points in X.
return_n_iter=True)
I wonder if it is because of the sample images? They look like these, of 80x80 pixels, there are 52 of them:
I tried changing the color mode to grayscale, however I received
IndexError: index 1 is out of bounds for axis 3 with size 1 instead.
Kindly advice if such clustering is feasible with my dataset. Will it work if I expand the dataset to perhaps 100-200 images? or if there is any other approach I should look at to group the dataset. Thanks!
UPDATE
Seems that the real issue is same image features extracted, so I've move this to another post: Keras Same Feature Extraction from Different Images

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Loading Custom Dataset via Keras - python

Related

Read image labels from a csv file

How to convert image from folder into tensors using torch?

How can I use a function or loop on this resnet50 code to predict the components of multiple images (within a folder), instead of just one?

how to include files with tf.data.Dataset

Convergence K-Means Unsupervised Image Clustering Pre-trained Keras Grayscale Image

Categories

Resources