If one would use tf.keras.preprocessing.image_dataset_from_directory('DIR') for this file structure
main_directory/
...class_a/
......a_image_1.jpg
......a_image_2.jpg
...class_b/
......b_image_1.jpg
......b_image_2.jpg
How would one make a dataset from this file structure?
main_directory/
...class_a/
......subclass_1/
.........a_image_1.jpg
......subclass_2/
.........a_image_2.jpg
...class_b/
......subclass_1/
.........b_image_1.jpg
......subclass_2/
.........b_image_2.jpg
I want tensorflow to train on both the class and subclass for each image or maybe just train on the concatenation of each image's class and subclass names, that works too.
tf.keras.preprocessing.image_dataset_from_directory('DIR') uses the subfolder of directory given, as class for the model.
For your 2nd directory structure, it will only have two classes,
class_a and class_b
Look through this tensorflow link to decoding images, it will help you.
In function parse image, you can decide what to use as label
def parse_image(filename):
parts = tf.strings.split(filename, os.sep)
label = parts[-2]
image = tf.io.read_file(filename)
image = tf.io.decode_jpeg(image)
image = tf.image.convert_image_dtype(image, tf.float32)
image = tf.image.resize(image, [128, 128])
return image, label
Here it's only using the folder name which is containing image, in your case, if you run the same code, you will have only two class subclass_1, subclass_2. Because from class_b directory also, it will pick the same classes.
However, if you want to differentiate it, you can following changes while declaring label,
label = parts[-3] + '_' + parts[-2]
Now it will have classes as, class_a_subclass_1, class_a_subclass_2, class_b_subclass_1, class_b_subclass_2, total of four class.
Related
I want to run real time object detection using YOLOv5 on a camera and then generate vector embeddings for cropped images of detected objects.
I currently generate image embeddings using this function below for locally saved images:
def generate_img_embedding(img_file_path):
images = [
Image.open(img_file_path)
]
# Encoding a single image takes ~20 ms
embeddings = embedding_model.encode(img_str)
return embeddings
also I start the Yolov5 objection detection with image cropping as follows
def start_camera(productid):
print("Attempting to start camera")
# productid = "11011"
try:
command = " python ./yolov5/detect.py --source 0 --save-crop --name "+ id +" --project ./cropped_images"
os.system(command)
print("Camera runnning")
except Exception as e:
print("error starting camera!", e)
How can I modify the YOLOv5 model to pass the cropped images into my embedding function in real time?
Just take a look at the detect.py supplied with yolov5, the file you are running. The implementation is pretty short (~150 SLOC), I would recommend re-implementing it or modifying for your use case.
Key points, omitting a lot of (important, but standard and easily understandable) data transforms and parameter parsing, are as follows:
device = select_device(device)
model = DetectMultiBackend(weights, device=device, dnn=dnn, data=data)
# Code selecting FP16/FP32 omitted here
model.warmup(imgsz=(1 if pt else bs, 3, *imgsz), half=half)
for path, im, im0s, vid_cap, s in dataset:
im = torch.from_numpy(im).to(device)
# Image transforms omitted
pred = model(im, augment=augment, visualize=visualize) # stage 1
pred = non_max_suppression(pred, conf_thres, iou_thres, classes, agnostic_nms, max_det=max_det) # stage 2
for i, det in enumerate(pred):
if len(det):
# Rescale boxes from img_size to im0 size
det[:, :4] = scale_coords(im.shape[2:], det[:, :4], im0.shape).round()
# --> This is where you would access detections in real time! <--
Most of the code's logic is handling the I/O (in particular, dataset loading is handled by either LoadStreams or LoadImages from yolov5's utils), the rest is just rescaling input images, loading a torch model, and running detection and NMS. No rocket science here.
The least effort path for you would be just copying the entire thing and implementing your embeddings under
for *xyxy, conf, cls in reversed(det):
Instead of saving to file, you would get (x, y, w, h) and crop the image using e.g. Pillow's Image.crop() or slice the numpy array directly. Whichever works for you depends on the implementation of your embedding_model.encode.
tl; don't want to read
How do I convert YoloV5 model results into results.pandas(), sort it, then convert it back into results so I can access the useful methods like results.render() or results.crop()?
Context:
I've recently learned how to load and do inference with a YoloV5 model:
# Load model
model = torch.hub.load('./yolov5', 'custom', path='/content/drive/MyDrive/models/best.pt', source='local') # local repo
# Import Image
im1 = 'https://ultralytics.com/images/zidane.jpg'
im2 = 'https://ultralytics.com/images/bus.jpg'
# Do Inference
results = model([im1, im2])
I also learned that this results object returned from inference has really useful methods for getting the result in different formats:
imgs = results.render() # gives image results with bounding boxes
crops = results.crop(save=True) # cropped detections dictionary
df = results.pandas().xyxy[0] # Pandas DataFrame of 1 image
n_df = results.pandas().xyxyn[0] # Pandas DataFrame of 1 image with normalized coordinates
My use-case here was to sort it, then get the top 20 in terms of confidence.
top_20 = results.pandas().xyxy[0].sort_values('score',ascending = False).groupby('confidence').head(20) # get top 17 sorted by confidence
Now I'm not sure how to turn it back to just results, so I can also access the same utility methods like .render() and .crop()
I think I could also create my own render and crop functions with OpenCV using my sorted dataframes as args, but I was just wondering if there was a more intuitive way to just reuse those utility methods.
Currently, I am preparing a synthetic dataset for object detection task. There are annotated datasets available for this kind of tasks like COCO dataset and Open Images V6. I am trying to download the images from there but only the foreground objects for a specific class e.g. person, in other words images without transparent background. The reason I am doing this is that I want to insert those images after editing them into a new images e.g. a street scene.
What I have tried so far, I used a library called FiftyOne and I downloaded the dataset with their semantic label and I am stuck here and I don`t what else to do.
It is not necessary to use FiftyOne any other method would work.
Here is the code that I have used to download a sample of the dataset with their labels
import fiftyone as fo
import fiftyone.zoo as foz
dataset = foz.load_zoo_dataset(
"coco-2017",
split="validation",
dataset_dir = "path/fiftyone",
label_types=["segmentations"],
classes = ["person"],
max_samples=10,
label_field="instances",
dataset_name="coco-images-person",
)
# Export the dataset
dataset.export(
export_dir = "path/fiftyone/image-segmentation-dataset",
dataset_type=fo.types.ImageSegmentationDirectory,
label_field="instances",
)
Thank you
The easiest way to do this is by using FiftyOne to iterate over your dataset in a simple Python loop, using OpenCV and Numpy to format and write the images of object instances to disk.
For example, this function will take in any collection of FiftyOne samples (either a Dataset for View) and write all object instances to disk in folders separated by class label:
import os
import cv2
import numpy as np
def extract_classwise_instances(samples, output_dir, label_field, ext=".png"):
print("Extracting object instances...")
for sample in samples.iter_samples(progress=True):
img = cv2.imread(sample.filepath)
img_h,img_w,c = img.shape
for det in sample[label_field].detections:
mask = det.mask
[x,y,w,h] = det.bounding_box
x = int(x * img_w)
y = int(y * img_h)
h, w = mask.shape
mask_img = img[y:y+h, x:x+w, :]
alpha = mask.astype(np.uint8)*255
alpha = np.expand_dims(alpha, 2)
mask_img = np.concatenate((mask_img, alpha), axis=2)
label = det.label
label_dir = os.path.join(output_dir, label)
if not os.path.exists(label_dir):
os.mkdir(label_dir)
output_filepath = os.path.join(label_dir, det.id+ext)
cv2.imwrite(output_filepath, mask_img)
Here is a complete example that loads a subset of the COCO2017 dataset and writes all "person" instances to disk:
import fiftyone as fo
import fiftyone.zoo as foz
from fiftyone import ViewField as F
dataset_name = "coco-image-example"
if dataset_name in fo.list_datasets():
fo.delete_dataset(dataset_name)
label_field = "ground_truth"
classes = ["person"]
dataset = foz.load_zoo_dataset(
"coco-2017",
split="validation",
label_types=["segmentations"],
classes=classes,
max_samples=20,
label_field=label_field,
dataset_name=dataset_name,
)
view = dataset.filter_labels(label_field, F("label").is_in(classes))
output_dir = "/path/to/output/segmentations/dir/"
os.makedirs(output_dir, exist_ok=True)
extract_classwise_instances(view, output_dir, label_field)
If this capability is something that will be used regularly, it may be useful to write a custom dataset exporter for this format.
I have generated many images like number plate as below[![enter image description here]
Now, want to convert all such images like real world vehicle number plate image.
For example-
How to these type of augmentation and save the all the augmented images in another folder.
Solution
Check out the library: albumentations. Try to answer the question: "what is the difference between the image you have and the image you want?". For instance, that image is :
more pixelated,
grainy,
has lower resolution,
also could have nails/fastening screws on it
may have something else written under or over the main number
may have shadows on it
the number plate may be unevenly bright at places, etc.
Albumentations, helps you come up with many types of image augmentations. Please try to break down this problem like I suggested and then try and find out which augemntations you need there from albumentations.
Example of image augmentation using albumentations
The following code block (source) shows you how to apply albumentations for image augmentation. In case you had an image and a mask, both of them will undergo identical transformations.
Another example from kaggle: Image Augmentation Demo with albumentation
from albumentations import (
HorizontalFlip, IAAPerspective, ShiftScaleRotate, CLAHE, RandomRotate90,
Transpose, ShiftScaleRotate, Blur, OpticalDistortion, GridDistortion, HueSaturationValue,
IAAAdditiveGaussianNoise, GaussNoise, MotionBlur, MedianBlur, IAAPiecewiseAffine,
IAASharpen, IAAEmboss, RandomBrightnessContrast, Flip, OneOf, Compose
)
import numpy as np
def strong_aug(p=0.5):
return Compose([
RandomRotate90(),
Flip(),
Transpose(),
OneOf([
IAAAdditiveGaussianNoise(),
GaussNoise(),
], p=0.2),
OneOf([
MotionBlur(p=0.2),
MedianBlur(blur_limit=3, p=0.1),
Blur(blur_limit=3, p=0.1),
], p=0.2),
ShiftScaleRotate(shift_limit=0.0625, scale_limit=0.2, rotate_limit=45, p=0.2),
OneOf([
OpticalDistortion(p=0.3),
GridDistortion(p=0.1),
IAAPiecewiseAffine(p=0.3),
], p=0.2),
OneOf([
CLAHE(clip_limit=2),
IAASharpen(),
IAAEmboss(),
RandomBrightnessContrast(),
], p=0.3),
HueSaturationValue(p=0.3),
], p=p)
image = np.ones((300, 300, 3), dtype=np.uint8)
mask = np.ones((300, 300), dtype=np.uint8)
whatever_data = "my name"
augmentation = strong_aug(p=0.9)
data = {"image": image, "mask": mask, "whatever_data": whatever_data, "additional": "hello"}
augmented = augmentation(**data)
image, mask, whatever_data, additional = augmented["image"], augmented["mask"], augmented["whatever_data"], augmented["additional"]
Strategy
First tone down the number of augmentations to a bare minimum
Save a single augmented-image
Save a few images post augmentation.
Now test and update your augmentation pipeline to suit your requirements of mimicking the ground-truth scenario.
finalize your pipeline and run it on a larger number of images.
Time it: how long this takes for how many images.
Then finally run it on all the images: this time you can have a time estimate on how long it is going to take to run it.
NOTE: every time an image passes through the augmentation pipeline, only a single instance of augmented image comes out of it. So, say you want 10 different augmented versions of each image, you will need to pass each image through the augmentation pipeline 10 times, before moving on to the next image.
# this will not be what you end up using
# but you can begin to understand what
# you need to do with it.
def simple_aug(p-0,5):
return return Compose([
RandomRotate90(),
# Flip(),
# Transpose(),
OneOf([
IAAAdditiveGaussianNoise(),
GaussNoise(),
], p=0.2),
])
# for a single image: check first
image = ... # write your code to read in your image here
augmentation = strong_aug(p=0.5)
augmented = augmentation({'image': image}) # see albumentations docs
# SAVE the image
# If you are using imageio or PIL, saving an image
# is rather straight forward, and I will let you
# figure that out.
# save the content of the variable: augmented['image']
For multiple images
Assuming each image passing 10 times through the augmentation pipeline, your code could look like as follows:
import os
# I assume you have a way of loading your
# images from the filesystem, and they come
# out of `images` (an iterator)
NUM_AUG_REPEAT = 10
AUG_SAVE_DIR = 'data/augmented'
# create directory of not present already
if not os.path.isdir(AUG_SAVE_DIR):
os.makedirs(AUG_SAVE_DIR)
# This will create augmentation ids for the same image
# example: '00', '01', '02', ..., '08', '09' for
# - NUM_AUG_REPEAT = 10
aug_id = lambda x: str(x).zfill(len(str(NUM_AUG_REPEAT)))
for image in images:
for i in range(NUM_AUG_REPEAT):
data = {'image': image}
augmented = augmentation(**data)
# I assume you have a function: save_image(image_path, image)
# You need to write this function with
# whatever logic necessary. (Hint: use imageio or PIL.Image)
image_filename = f'image_name_{aug_id(i)}.png'
save_image(os.path.join(AUG_SAVE_DIR, image_filename), augmented['image'])
I am trying to make a tensorflow dataset api(tf version 1.8) for a set of images which are of different sizes. To do this, I am extracting patches of same size from the images and feeding it to my neural net.
The problem is in tf.extract_patches_from_images, the patches get stored in the channel dimension. As each image is of different size, number of patches are different for each image. Hence the shape of each resulting image is different. Hence I can't batch them together using tf dataset api.
Can someone suggest changes in my following modify_image function to tackle the issue?
I guess separating the patches into different images and then batching them together would work. But I can't understand how to do that.
I want to scan the whole image, hence randomly selecting equal number of patches won't work for me.
def modify_image(image):
'''add preprocessing functions here'''
image = tf.expand_dims(image,0)
image = tf.extract_image_patches(
image,
ksizes=[1,patch_size,patch_size,1],
strides=[1,patch_size,patch_size,1],
rates=[1,1,1,1],
padding='SAME',
name=None
)
image = tf.reshape(image,shape=[-1,patch_size,patch_size,1])
return image;
def parse_function(image,labels):
image= tf.read_file(image)
image = tf.image.decode_image(image)
labels = tf.read_file(labels)
labels = tf.image.decode_image(labels)
image = modify_image(image)
labels = modify_image(labels)
return image,labels
def list_files(directory):
files = glob.glob(directory)
return files
def load_dataset(img_dir,labels_dir):
images = list_files(img_dir)
images = tf.constant(images)
labels = list_files(labels_dir)
labels = tf.constant(labels)
dataset = tf.data.Dataset.from_tensor_slices((images,labels))
dataset = dataset.map(parse_function)
return dataset
def make_batches(home_dir,img_dir,labels_dir,batch_size):
img_dir = home_dir + img_dir
labels_dir = home_dir +labels_dir
dataset = load_dataset(img_dir,labels_dir)
batched_dataset = dataset.batch(batch_size)
return batched_dataset
The tf.contrib.data.unbatch() transformation might be helpful here, as it can separate the patches from a single image into different elements:
dataset = tf.data.Dataset.from_tensor_slices((images,labels))
dataset = dataset.map(parse_function)
patches_dataset = dataset.apply(tf.contrib.data.unbatch())
batched_dataset = dataset.batch(batch_size)
Note that for tf.contrib.data.unbatch() to work, the number of patches in an image must match the number of elements/rows in labels. For example, if each patch should get the same label, you could achieve this by modifying parse_function() as follows to tf.tile() the labels an appropriate number of times:
def parse_function(images, labels):
# ...
return image, tf.tile([labels], tf.shape(image)[0:1])