how to include files with tf.data.Dataset

how to include files with tf.data.Dataset - python

I am training Face-recognition model, So for Triplet Loss, I have to generate the batch such that it contains fixed amount of images from each label. For eg. I am saying that take 8 images from 3 random labels each time it generates batch for training, As suggested in this Github Issue.
In my dataset folder I have subfolder which is renamed as a label and contains the images of that folder.
In the given issue, solution is presented,
import numpy as np
import cv2
num_labels = len(path_list)
num_classes_per_batch = 3
num_images_per_class = 8
image_dirs = ["/content/drive/My Drive/smalld_processed/train/{:d}".format(i) for i in
range(num_labels)]
## Create the list of datasets creating filenames
#datasets = [tf.data.Dataset.list_files(f"{image_dir}/*.jpg" for image_dir in image_dirs)]
datasets = [tf.data.Dataset.list_files(f"{image_dir}/*.jpg") for image_dir in image_dirs]
adk = ["{}/*.jpg".format(image_dir) for image_dir in image_dirs]
print(adk)
def generator():
while True:
# Sample the labels that will compose the batch
labels = np.random.choice(range(num_labels),
num_classes_per_batch,
replace=False)
for label in labels:
for _ in range(num_images_per_class):
yield label
choice_dataset = tf.data.Dataset.from_generator(generator, tf.int64)
dataset = tf.data.experimental.choose_from_datasets(datasets, choice_dataset)
## Now you read the image content
def load_image(filename):
image = cv2.imread(filename,1)
image = dataset.map(image, num_parallel_calls=tf.data.experimental.AUTOTUNE)
image = image[...,::-1]
label = int(os.path.split(os.path.dirname(filename))[1])
image=dataset1.append()
label=dataset2.append
return image, label
dataset = dataset.map(load_image, num_parallel_calls=tf.data.experimental.AUTOTUNE)
batch_size = num_classes_per_batch * num_images_per_class
dataset = dataset.batch(batch_size)
dataset = dataset.prefetch(None)
With this I am not able to load the images and it's showing me this error.
SystemError: <built-in function imread> returned NULL without setting an error
Could you help me to fix the error or any other suggestion on how to load images.
Thanks in advance!!

I think that in this case your cv2.imread is acting up. I would first build a simple program that does not do the reading "on the fly", but instead pre-loads images to train on a small dataset.
It also feels like you are misusing the dataset.map function. I would recommend this tutorial on the tf.data.Dataset function: http://tensorexamples.com/2020/07/27/Using-the-tf.data.Dataset.html, and maybe this one on augmentation so you can see how you should use the map function properly: http://tensorexamples.com/2020/07/28/Augmentation.html.
Good luck!

Related

Read image labels from a csv file

I have a dataset of medical images (.dcm) which I can read into TensorFlow as a batch. However, the problem that I am facing is that the labels of these images are in a .csv. The .csv file contains two columns - image_path (location of the image) and image_labels (0 for no; 1 for yes). I wanted to know how I can read the labels into a TensorFlow dataset batch wise. I am using the following code to load the images batch wise:-
import tensorflow as tf
import tensorflow_io as tfio
def process_image(filename):
image_bytes = tf.io.read_file(filename)
image = tf.squeeze(
tfio.image.decode_dicom_image(image_bytes, on_error='strict', dtype=tf.uint16),
axis = 0
)
x = tfio.image.decode_dicom_data(image_bytes, tfio.image.dicom_tags.PhotometricInterpretation)
image = (image - tf.reduce_min(image))/(tf.reduce_max(image) - tf.reduce_min(image))
if(x == "MONOCHROME1"):
image = 1 - image
image = image*255
image = tf.cast(tf.image.resize(image, (512, 512)),tf.uint8)
return image
# train_images is a list containing the locations of .dcm images
dataset = tf.data.Dataset.from_tensor_slices(train_images)
dataset = dataset.map(process_image, num_parallel_calls=4).batch(50)
Hence, I can load the images into the TensorFlow dataset. But I would like to know how I can load the image labels batch wise.

Something like this instead of the last two lines should work:
#train_labels is a list of labels for each image in the same order as in train_images
dataset = tf.data.Dataset.from_tensor_slices((train_images, train_labels))
dataset = dataset.map(lambda x,y : (process_image(x), y), num_parallel_calls=4).batch(50)
now the dataset can be passed to your network's .fit(), .predict() and other methods:
model.fit(dataset, epochs=epochs, callbacks=callbacks)
Alternatively, you can create a second dataset containing the labels and then combine two datasets with tf.data.Dataset.zip(). It works similarly to the python's native zip.
I prefer the first method since It feels a bit cleaner to me + I can, for example, shuffle the filenames/labels and only then parse the files instead of doing the opposite.

Loading Custom Dataset via Keras

I’ve got a simple GAN model (Keras-based) that I use for handwritten digit image generation based on the MNIST dataset. I want to create a similar dataset for Keras using raw image data from the Sokoto Coventry Fingerprint Dataset (SOCOFing), which consists of 6000 different black and white fingerprint image samples, and apply it to the same GAN model. The problem is — I’m stuck with creating and loading/processing the custom dataset.
This is the code from the model that I use for MNIST:
import os
import numpy as np
import matplotlib.pyplot as plt
from tqdm import tqdm
from keras.layers import Input
from keras.models import Model, Sequential
from keras.layers.core import Dense, Dropout
from keras.layers.advanced_activations import LeakyReLU
from keras.datasets import mnist
from tensorflow.keras.optimizers import Adam
from keras import initializers
os.environ["KERAS_BACKEND"] = "tensorflow"
np.random.seed(10)
random_dim = 100
def load_mnist_data():
    (x_train, y_train), (x_test, y_test) = mnist.load_data()
    x_train = (x_train.astype(np.float32) - 127.5)/127.5
    x_train = x_train.reshape(60000, 784)
    return (x_train, y_train, x_test, y_test)
For experimentation purposes, I’ve created a smaller version of the SOCOFing dataset that contains 500 samples only. The code for the dataset generator is as follows:
from PIL import Image
import os
import numpy as np
path_to_files = "./fingerprints/"
vectorized_images_X = []
vectorized_images_Y = []
for _, file in enumerate(os.listdir(path_to_files)):
image = Image.open(path_to_files + file)
image_array = np.array(image)
vectorized_images_X.append(image_array)
vectorized_images_Y.append(image_array)
np.savez("./fingerprints.npz",DataX=vectorized_images_X,DataY=vectorized_images_Y)
import numpy as np
path = "./fingerprints.npz"
with np.load(path) as data:
train_data = data['DataX']
print(train_data)
test_data = data['DataY']
print(test_data)
So now I’ve got a *.npz file but don’t know how to inject it into the model. Please advise.

Here is Sample code to insert any .npz file, You can refer this code.
DATA_URL = 'https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz'
path = tf.keras.utils.get_file('mnist.npz', DATA_URL)
with np.load(path) as data:
train_examples = data['x_train']
train_labels = data['y_train']
test_examples = data['x_test']
test_labels = data['y_test']
For more details,you can follow this link.

how to train model with batches

I trying yolo model in python.
To process the data and annotation I'm taking the data in batches.
batchsize = 50
#boxList= []
#boxArr = np.empty(shape = (0,26,5))
for i in range(0, len(box_list), batchsize):
boxList = box_list[i:i+batchsize]
imagesList = image_list[i:i+batchsize]
#to convert the annotation from VOC format
convertedBox = np.array([np.array(get_boxes_for_id(box_l)) for box_l in boxList])
#pre-process on image and annotaion
image_data, boxes = process_input_data(imagesList,max_boxes,convertedBox)
boxes = np.array(list(itertools.chain.from_iterable(boxes)))
detectors_mask, matching_true_boxes = get_detector_mask(boxes, anchors)
after this, I want to pass my data to the model to train.
when I append the list it gives memory error because of array size.
and when i append array gives dimensionality error because of shape.
how can i train the data and what shoud i use model.fit() or model.train_on_batch()

If you are using Keras to Train your model with a bunch of Images you can use Train generator and validation generator, all you have to do is put your images in there respective class folders. look at a sample code . also take a look at this link maybe it may help you https://keras.io/preprocessing/image/ . i hope i have answered your question unless i did not understand it

Pytorch - handling picutres and .jpeg files (beginner's questions)

I am new at Pytorch, and have a couple of questions regarding the way pictures are being handled:
1) In the "training a classifier" tutorial, the pictures are PIL files, and are being handled via the following commands (where "transform" also turns the PIL format into a tensor format):
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
shuffle=True, num_workers=2)
It seems like trainset[1] (and also for the other indices) consists of a tensor, and a number. I want to define a new variable "image" that will consist of the tensor part of trainset[ 1 ] and then print it - how can I do it?
2) Assume that I have a different dataset that I want to classify. It consists of .jpeg images that are located in the folder "C:/temp/dataset". How can I define the variable "trainset" to consist of these images?
Thanks a lot in advance!

For your first question:
image = trainset[1][0]
print(image)
For your second question:
from PIL import Image
import numpy as np
import os
def load_image(infilename):
"""This function loads an image into memory when you give it
the path of the image
"""
img = Image.open(infilename)
img.load()
data = np.asarray(img, dtype="float32")
return data
def create_npy_from_image(images_folder, output_name, num_images, image_dim):
"""Loops through the images in a folder and saves all of them
as a numpy array in output_name
"""
image_matrix = np.empty((num_images, image_dim, image_dim, 3), dtype=np.float32)
for i, filename in enumerate(os.listdir(images_folder)):
if filename.endswith(".jpg"):
data = load_image(images_folder + filename)
image_matrix[i] = data
else:
continue
np.save(output_name, image_matrix)
So I would write something like this:
create_npy_from_image(path_to_images_folder, "trainset.npy", numer_of_images_in_your_folder, DIM)
DIM is 64 for example if your images are 64x64x3
You can then load the saved array with np.load and then convert it to a pytorch tensor using from_numpy function.
Let me know if this works. Good luck!

Pytorch: Can’t load images using ImageFolder

I’m trying to load images using “ImageFolder”.
data_dir = './train_dog' # directory structure is
train_dog/image
dset = datasets.ImageFolder(data_dir, transform)
train_loader = torch.utils.data.DataLoader(dset, batch_size=128, shuffle=True)
However, it seems not working. So I checked the stored data as below
print dset[0][0]
Then it shows only 3 tensors(size 64x64).
[torch.FloatTensor of size 3x64x64]
There are more than 10,000 images in the folder. How come it can’t store all data?

You should try this:
print len(dset)
which represents the size of the dataset, aka the number of image files.
dset[0] means the (shuffled) first index of the dataset, where dset[0][0] contains the input image tensor and dset[0][1] contains the corresponding label or target.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

how to include files with tf.data.Dataset - python

Related

Read image labels from a csv file

Loading Custom Dataset via Keras

how to train model with batches

Pytorch - handling picutres and .jpeg files (beginner's questions)

Pytorch: Can’t load images using ImageFolder

Categories

Resources