In detectron2 there are class IDs instead of class names - python

I finished training model for instance segmentation in detectron2 when I test images in training files there is no problem class names(apple,banana,orange) are written on the image but I downloaded some fruit images from the internet and class names are not written on the photos. There are class ID's.
cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth")
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5
cfg.DATASETS.TEST = ("fruit_test", )
predictor = DefaultPredictor(cfg)
image_path = "/content/detectron2_custom_dataset/testimages/test2.jpg"
def on_image(image_path,predictor):
im = cv2.imread(image_path)
outputs = predictor(im)
v = Visualizer(im[:,:,::-1], metadata = {}, scale=0.5, instance_mode = ColorMode.SEGMENTATION)
v = v.draw_instance_predictions(outputs["instances"].to("cpu"))
plt.figure(figsize=(14,10))
plt.imshow(v.get_image())
plt.show()
on_image(image_path, predictor)
In conclusion, I want test my model with I uploaded now and I don't want to have class id on the images. I want class names like orange,banana,apple on it

After searching for an easy solution, I came with this, which is the easiest IMO.
Create a new Metadata class
Class Metadata:
def get(self, _):
return ['apple','banana','orange','etc'] #your class labels
then, provide Metadata in the Visualizer line
v = Visualizer(im[:, :, ::-1], Metadata, scale=0.5, instance_mode = ColorMode.SEGMENTATION)
Otherwise, you're required to register a model with (a dummy) data or even 1 image with annotations, then load its metadata. Which I find a bit irrelevant at this stage, if you need it for the inference code only.

You can get the labels by populating the metadata kwarg, which contains the mapping.
v = Visualizer(im[:, :, ::-1], MetadataCatalog.get(cfg.DATASETS.TEST[0]), scale=0.5, instance_mode = ColorMode.SEGMENTATION)

I would change the 'metadata = {}' in the Visualizer line to
{"thing_classes":['orange','banana','apple']}
the entire line would be
v = Visualizer(im[:,:,::-1], {"thing_classes":['orange','banana','apple']}, scale=0.5, instance_mode = ColorMode.SEGMENTATION)
based on the below code in draw_instance_predictions function (detectron2/utils/visualizer.py)
labels = _create_text_labels(classes, scores, self.metadata.get("thing_classes", None))

Related

How to load a TF Lite Model into Python from a file

I've followed the End-to-End image classification tutorial for tensorflow lite and have created and saved my model as '/path/to/model.tflite'.
What I haven't been able to figure out is how to load it.
I'm looking for some kind of syntax that is similar to this:
from tflite_model_maker import image_classifier
from tflite_model_maker.image_classifier import DataLoader
model = image_classifier.Load('/path/to/model.tflite')
I'm sure I'm missing something obvious here. This is definitely not the first place I've looked at. This seems to be the best place for me to find what I need, but the syntax used confuses me.
What do I want to be able to do with the model?
test = DataLoader.from_folder('/path/to/testImages')
loss, accuracy = model.evaluate(test)
# A helper function that returns 'red'/'black' depending on if its two input
# parameter matches or not.
def get_label_color(val1, val2):
if val1 == val2:
return 'black'
else:
return 'red'
# Then plot 100 test images and their predicted labels.
# If a prediction result is different from the label provided label in "test"
# dataset, we will highlight it in red color.
test_data = data
plt.figure(figsize=(20, 20))
predicts = model.predict_top_k(test_data)
for i, (image, label) in enumerate(test_data.gen_dataset().unbatch().take(100)):
ax = plt.subplot(10, 10, i+1)
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.imshow(image.numpy(), cmap=plt.cm.gray)
predict_label = predicts[i][0][0]
color = get_label_color(predict_label,
test_data.index_to_label[label.numpy()])
ax.xaxis.label.set_color(color)
plt.xlabel('Predicted: %s' % predict_label)
plt.show()
From the syntax above it seems the model isn't just a file but is a type/class/method depending on what name is most suitable for python.
Feels like this should only take one line of code but I haven't been able to find it anywhere.
Managed to do a simple version of it. The images coming up as a stream doesn't work for me using cv2 with Windows as it does for the pi. So instead I created a webpage in the same directory as this script. This generates an image with the bounding box, using a specified tflite model. This is in no way ideal.
It uses a webcam to get the image and saves the image to the directory the script is run in. It then renames the file so it can be viewed by the webpage I setup to view it.
The majority of this code comes from the TFLite Object Detection Raspberry Pi sample.
import time, os
from PIL import Image
from tflite_support import metadata
import platform
from typing import List, NamedTuple
import json
import cv2 as cv2
import numpy as np
import tensorflow as tf
from matplotlib import pyplot as plt
Interpreter = tf.lite.Interpreter
load_delegate = tf.lite.experimental.load_delegate
class ObjectDetectorOptions(NamedTuple):
"""A config to initialize an object detector."""
enable_edgetpu: bool = False
"""Enable the model to run on EdgeTPU."""
label_allow_list: List[str] = None
"""The optional allow list of labels."""
label_deny_list: List[str] = None
"""The optional deny list of labels."""
max_results: int = -1
"""The maximum number of top-scored detection results to return."""
num_threads: int = 1
"""The number of CPU threads to be used."""
score_threshold: float = 0.0
"""The score threshold of detection results to return."""
class Rect(NamedTuple):
"""A rectangle in 2D space."""
left: float
top: float
right: float
bottom: float
class Category(NamedTuple):
"""A result of a classification task."""
label: str
score: float
index: int
class Detection(NamedTuple):
"""A detected object as the result of an ObjectDetector."""
bounding_box: Rect
categories: List[Category]
def edgetpu_lib_name():
"""Returns the library name of EdgeTPU in the current platform."""
return {
'Darwin': 'libedgetpu.1.dylib',
'Linux': 'libedgetpu.so.1',
'Windows': 'edgetpu.dll',
}.get(platform.system(), None)
class ObjectDetector:
"""A wrapper class for a TFLite object detection model."""
_OUTPUT_LOCATION_NAME = 'location'
_OUTPUT_CATEGORY_NAME = 'category'
_OUTPUT_SCORE_NAME = 'score'
_OUTPUT_NUMBER_NAME = 'number of detections'
def __init__(
self,
model_path: str,
options: ObjectDetectorOptions = ObjectDetectorOptions()
) -> None:
"""Initialize a TFLite object detection model.
Args:
model_path: Path to the TFLite model.
options: The config to initialize an object detector. (Optional)
Raises:
ValueError: If the TFLite model is invalid.
OSError: If the current OS isn't supported by EdgeTPU.
"""
# Load metadata from model.
displayer = metadata.MetadataDisplayer.with_model_file(model_path)
# Save model metadata for preprocessing later.
model_metadata = json.loads(displayer.get_metadata_json())
process_units = model_metadata['subgraph_metadata'][0]['input_tensor_metadata'][0]['process_units']
mean = 0.0
std = 1.0
for option in process_units:
if option['options_type'] == 'NormalizationOptions':
mean = option['options']['mean'][0]
std = option['options']['std'][0]
self._mean = mean
self._std = std
# Load label list from metadata.
file_name = displayer.get_packed_associated_file_list()[0]
label_map_file = displayer.get_associated_file_buffer(file_name).decode()
label_list = list(filter(lambda x: len(x) > 0, label_map_file.splitlines()))
self._label_list = label_list
# Initialize TFLite model.
if options.enable_edgetpu:
if edgetpu_lib_name() is None:
raise OSError("The current OS isn't supported by Coral EdgeTPU.")
interpreter = Interpreter(
model_path=model_path,
experimental_delegates=[load_delegate(edgetpu_lib_name())],
num_threads=options.num_threads)
else:
interpreter = Interpreter(
model_path=model_path, num_threads=options.num_threads)
interpreter.allocate_tensors()
input_detail = interpreter.get_input_details()[0]
# From TensorFlow 2.6, the order of the outputs become undefined.
# Therefore we need to sort the tensor indices of TFLite outputs and to know
# exactly the meaning of each output tensor. For example, if
# output indices are [601, 599, 598, 600], tensor names and indices aligned
# are:
# - location: 598
# - category: 599
# - score: 600
# - detection_count: 601
# because of the op's ports of TFLITE_DETECTION_POST_PROCESS
# (https://github.com/tensorflow/tensorflow/blob/a4fe268ea084e7d323133ed7b986e0ae259a2bc7/tensorflow/lite/kernels/detection_postprocess.cc#L47-L50).
sorted_output_indices = sorted(
[output['index'] for output in interpreter.get_output_details()])
self._output_indices = {
self._OUTPUT_LOCATION_NAME: sorted_output_indices[0],
self._OUTPUT_CATEGORY_NAME: sorted_output_indices[1],
self._OUTPUT_SCORE_NAME: sorted_output_indices[2],
self._OUTPUT_NUMBER_NAME: sorted_output_indices[3],
}
self._input_size = input_detail['shape'][2], input_detail['shape'][1]
self._is_quantized_input = input_detail['dtype'] == np.uint8
self._interpreter = interpreter
self._options = options
def detect(self, input_image: np.ndarray) -> List[Detection]:
"""Run detection on an input image.
Args:
input_image: A [height, width, 3] RGB image. Note that height and width
can be anything since the image will be immediately resized according
to the needs of the model within this function.
Returns:
A Person instance.
"""
image_height, image_width, _ = input_image.shape
input_tensor = self._preprocess(input_image)
self._set_input_tensor(input_tensor)
self._interpreter.invoke()
# Get all output details
boxes = self._get_output_tensor(self._OUTPUT_LOCATION_NAME)
classes = self._get_output_tensor(self._OUTPUT_CATEGORY_NAME)
scores = self._get_output_tensor(self._OUTPUT_SCORE_NAME)
count = int(self._get_output_tensor(self._OUTPUT_NUMBER_NAME))
return self._postprocess(boxes, classes, scores, count, image_width,
image_height)
def _preprocess(self, input_image: np.ndarray) -> np.ndarray:
"""Preprocess the input image as required by the TFLite model."""
# Resize the input
input_tensor = cv2.resize(input_image, self._input_size)
# Normalize the input if it's a float model (aka. not quantized)
if not self._is_quantized_input:
input_tensor = (np.float32(input_tensor) - self._mean) / self._std
# Add batch dimension
input_tensor = np.expand_dims(input_tensor, axis=0)
return input_tensor
def _set_input_tensor(self, image):
"""Sets the input tensor."""
tensor_index = self._interpreter.get_input_details()[0]['index']
input_tensor = self._interpreter.tensor(tensor_index)()[0]
input_tensor[:, :] = image
def _get_output_tensor(self, name):
"""Returns the output tensor at the given index."""
output_index = self._output_indices[name]
tensor = np.squeeze(self._interpreter.get_tensor(output_index))
return tensor
def _postprocess(self, boxes: np.ndarray, classes: np.ndarray,
scores: np.ndarray, count: int, image_width: int,
image_height: int) -> List[Detection]:
"""Post-process the output of TFLite model into a list of Detection objects.
Args:
boxes: Bounding boxes of detected objects from the TFLite model.
classes: Class index of the detected objects from the TFLite model.
scores: Confidence scores of the detected objects from the TFLite model.
count: Number of detected objects from the TFLite model.
image_width: Width of the input image.
image_height: Height of the input image.
Returns:
A list of Detection objects detected by the TFLite model.
"""
results = []
# Parse the model output into a list of Detection entities.
for i in range(count):
if scores[i] >= self._options.score_threshold:
y_min, x_min, y_max, x_max = boxes[i]
bounding_box = Rect(
top=int(y_min * image_height),
left=int(x_min * image_width),
bottom=int(y_max * image_height),
right=int(x_max * image_width))
class_id = int(classes[i])
category = Category(
score=scores[i],
label=self._label_list[class_id], # 0 is reserved for background
index=class_id)
result = Detection(bounding_box=bounding_box, categories=[category])
results.append(result)
# Sort detection results by score ascending
sorted_results = sorted(
results,
key=lambda detection: detection.categories[0].score,
reverse=True)
# Filter out detections in deny list
filtered_results = sorted_results
if self._options.label_deny_list is not None:
filtered_results = list(
filter(
lambda detection: detection.categories[0].label not in self.
_options.label_deny_list, filtered_results))
# Keep only detections in allow list
if self._options.label_allow_list is not None:
filtered_results = list(
filter(
lambda detection: detection.categories[0].label in self._options.
label_allow_list, filtered_results))
# Only return maximum of max_results detection.
if self._options.max_results > 0:
result_count = min(len(filtered_results), self._options.max_results)
filtered_results = filtered_results[:result_count]
return filtered_results
_MARGIN = 10 # pixels
_ROW_SIZE = 10 # pixels
_FONT_SIZE = 1
_FONT_THICKNESS = 1
_TEXT_COLOR = (0, 0, 255) # red
def visualize(
image: np.ndarray,
detections: List[Detection],
) -> np.ndarray:
"""Draws bounding boxes on the input image and return it.
Args:
image: The input RGB image.
detections: The list of all "Detection" entities to be visualize.
Returns:
Image with bounding boxes.
"""
for detection in detections:
# Draw bounding_box
start_point = detection.bounding_box.left, detection.bounding_box.top
end_point = detection.bounding_box.right, detection.bounding_box.bottom
cv2.rectangle(image, start_point, end_point, _TEXT_COLOR, 3)
# Draw label and score
category = detection.categories[0]
class_name = category.label
probability = round(category.score, 2)
result_text = class_name + ' (' + str(probability) + ')'
text_location = (_MARGIN + detection.bounding_box.left,
_MARGIN + _ROW_SIZE + detection.bounding_box.top)
cv2.putText(image, result_text, text_location, cv2.FONT_HERSHEY_PLAIN,
_FONT_SIZE, _TEXT_COLOR, _FONT_THICKNESS)
return image
# ---------------------------------- #
# This is where the custom code starts
# ---------------------------------- #
# Load the TFLite model
TFLITE_MODEL_PATH='object.tflite'
DETECTION_THRESHOLD = 0.5 # 50% threshold required before identifying
options = ObjectDetectorOptions(
num_threads=4,
score_threshold=DETECTION_THRESHOLD,
)
# Close camera if already open
try:
cap.release()
except:
print("",end="") # do nothing
detector = ObjectDetector(model_path=TFLITE_MODEL_PATH, options=options)
cap = cv2.VideoCapture(0) #webcam
counter = 0 # Store many times model has run
while cap.isOpened():
success, image = cap.read()
if not success:
sys.exit(
'ERROR: Unable to read from webcam. Please verify your webcam settings.'
)
image = cv2.flip(image, 1)
# Convert the image from BGR to RGB as required by the TFLite model.
rgb_image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
#image.thumbnail((512, 512), Image.ANTIALIAS)
image_np = np.asarray(image)
# Run object detection estimation using the model.
detections = detector.detect(image_np)
# Draw keypoints and edges on input image
image_np = visualize(image_np, detections)
if counter == 10: # <- Change this to decide how many iterations
cap.release()
break
image_np = cv2.cvtColor(image_np, cv2.COLOR_BGR2RGB)
plt.imsave('tmp.jpg',image_np) # Saves the image
os.replace("tmp.jpg", "web.jpg",) # Renames it for the webpage
counter += 1
print(counter)
cap.release()
Here's the HTML for the document placed in the same directory as the python file, I saved it as index.html and opened in the browser while running the python script above.
<!DOCTYPE html>
<html>
<head>
<title>Object Detection</title>
</head>
<body>
<h1>Object Detection</h1>
<p>This displays images saved during detection process</p>
<canvas id="x" width="700px" height="500px"></canvas>
<script>
var newImage = new Image();
newImage.src = "web.jpg";
var canvas = document.getElementById("x");
var context = canvas.getContext("2d");
newImage.onload = function() {
context.drawImage(newImage, 0, 0);
console.log("trigger")
setTimeout(timedRefresh, 1000);
};
function timedRefresh() {
// just change src attribute, will always trigger the onload callback
try {
newImage.src = ("web.jpg#" + new Date().getTime());
}catch(e){
console.log(e);
}
}
setTimeout(timedRefresh, 100);
</script>
</body>
</html>
It's incredibly slow, not ideal in many ways and probably breaks many good coding conventions. It was only used locally, would definitely not use this for a production environment nor recommend its use. Just needed a quick proof of concept and this worked for that.

Creating your own dataset in tensorflow

I am faced with the task of classifying sound by spectrograms. I have a solution to this problem in one way (I will convert all audio recordings into spectrograms -> save them as pictures and train a neural network for this), but I want to go the simpler way, that is, not save pictures, but immediately convert audio files into tensors, but there is a problem, I can't find any useful information on how to create my data set from tensors in TensorFlow. I will give an example of such code on Pytorch.
class SoundDataset(Dataset):
def __init__(self, file_names, labels):
self.file_names = file_names
self.labels = labels
def __getitem__(self,index):
#format the file path and load the file
path = self.file_names[index]
scale, sr = librosa.load(path)
filter_banks = librosa.filters.mel(n_fft=2048, sr=22050, n_mels=10)
mel_spectrogram = librosa.feature.melspectrogram(scale, sr=sr, n_fft=2048, hop_length=512, n_mels=32)
log_mel_spectrogram = librosa.power_to_db(mel_spectrogram)
trch = torch.from_numpy(log_mel_spectrogram)
if log_mel_spectrogram.shape !=(10,87):
delta = 87 - log_mel_spectrogram.shape[1]
trch = torch.nn.functional.pad(trch, (0,delta))
return trch,self.labels[index]
def __len__(self):
return len(self.file_names)
Here a class is being created that takes paths to audio recordings and converts them into tensors, and will pad zeros if the tensors do not fit the shapes. How can I create the same class for TensorFlow. Next is an example of code that creates tuples with file paths and their class and creates an object of the Sound Data set class and generates a dataset from these files accordingly. All this is written for Pytorch. Tell me how it can be implemented for TensorFlow.
path = '/content/drive/MyDrive/МДМА/audiodata/for-rerecorded/training/'
files = []
labels = []
lbl = '1 0'.split()
for lab in lbl:
if lab == '0':
c = 'fake'
else:
c ='real'
names = os.listdir(path+c)
for n in names:
pth = path+c+'/'+n
files.append(pth)
labels.append(int(lab))
train_dataset = SoundDataset(files, labels)
train_loader = torch.utils.data.DataLoader(train_dataset,batch_size = 20)
If you read the documentation there are code patterns.
This is not tested but if you load the index from another data structure which has mapped the files to the indexes then this code can help.
import librosa
import pathlib
import tensorflow as tf
DATASET_PATH = 'data/mini_speech_commands'
data_dir = pathlib.Path(DATASET_PATH)
if not data_dir.exists():
tf.keras.utils.get_file(
'mini_speech_commands.zip',
origin="http://storage.googleapis.com/download.tensorflow.org/data/mini_speech_commands.zip",
extract=True,
cache_dir='.', cache_subdir='data')
def load_audio(filename):
scale, sr = librosa.load(filename)
mel_spectrogram = librosa.feature.melspectrogram(scale, sr=sr, n_fft=2048, hop_length=512, n_mels=32)
log_mel_spectrogram = librosa.power_to_db(mel_spectrogram)
spectrogram_numpy = log_mel_spectrogram.numpy()
if log_mel_spectrogram.shape !=(10,87):
delta = 87 - log_mel_spectrogram.shape[1]
spectrogram_numpy = tf.pad(spectrogram_numpy, (0,delta))
return spectrogram_numpy #return index
read_audio = lambda x: tf.py_function(load_audio,
[x],
tf.float64)
filenames = tf.io.gfile.glob(str(data_dir) + '/*/*')
files_ds = tf.data.Dataset.from_tensor_slices(filenames)
waveform_ds = files_ds.map(
map_func=read_audio)
The code to pad is converted using TensorFlow directly and you have to test it.
Update : Another way using keras.utils.Sequence is shown in this thread

Multiple color bounding-boxes for different category objects

I am implementing "BBAVectors-Oriented-Object-Detection" https://github.com/yijingru/BBAVectors-Oriented-Object-Detection model. After testing the model, all the detected objects are shown with same colour bounding-boxes.
Following is the link to the test.py code https://github.com/yijingru/BBAVectors-Oriented-Object-Detection/blob/master/test.py. I figured out the following line of code changes the bounding boxes colour.
ori_image = cv2.drawContours(ori_image, [np.int0(box)], -1, (255,0,255),1,1)
However, it only takes a single colour at a time.
In addition, in another file "dataset_dota.py" of "BBAVectors-Oriented-Object-Detection" https://github.com/yijingru/BBAVectors-Oriented-Object-Detection/blob/master/datasets/dataset_dota.py I found the assigned colours to each objects category.
class DOTA(BaseDataset):
def __init__(self, data_dir, phase, input_h=None, input_w=None, down_ratio=None):
super(DOTA, self).__init__(data_dir, phase, input_h, input_w, down_ratio)
self.category = ['plane',
'baseball-diamond',
'bridge',
'ground-track-field',
'small-vehicle',
'large-vehicle',
'ship',
'tennis-court',
'basketball-court',
'storage-tank',
'soccer-ball-field',
'roundabout',
'harbor',
'swimming-pool',
'helicopter'
]
self.color_pans = [(204,78,210),
(0,192,255),
(0,131,0),
(240,176,0),
(254,100,38),
(0,0,255),
(182,117,46),
(185,60,129),
(204,153,255),
(80,208,146),
(0,0,204),
(17,90,197),
(0,255,255),
(102,255,102),
(255,255,0)]
self.num_classes = len(self.category)
self.cat_ids = {cat:i for i,cat in enumerate(self.category)}
self.img_ids = self.load_img_ids()
self.image_path = os.path.join(data_dir, 'images')
self.label_path = os.path.join(data_dir, 'labelTxt')
I am fresher in Python and Opencv. I will be thankful if someone can help me to link these files and show the detected objects in different colour bounding boxes or another method to show the different object categories in different colour bounding boxes. Thank you
In the current version the detected objects are shown as follows:
detected samples
Here is a snippet :
# Initialize the DOTA Class
dota_object = DOTA(data_dir, phase)
# Get color
object = 'plane'
id = dota_object.category.index(object)
color = dota_object.color_pans[id]
# Display the image
ori_image = cv2.drawContours(ori_image, [np.int0(box)], -1, color,1,1)
Replace the variables with the ones in your code (ori_image,dota_object,data_dir,phase)

Pytorch/torchvision - How to increase limit of detectable objects

I am new to Pytorch and so far it has been incredible. I am using it to count the number of pills in an image. I have found that in the majority of my images the max number of objects that it detects is 100. For the picture below it reaches a max count of 100 with the confidence around .6. After that it doesn't increase anymore even down to .1 confidence. I haven't been able to find anything in the docs or any other places online. I am using the fasterrcnn_resnet50_fpn model. Below is the code that load the trained model and evaluate the image. Any tips or even different packages that would be able to count all objects would be super helpful.
## Loading the trained module
loaded_model = get_model(num_classes = 2)
loaded_model.load_state_dict(torch.load('Pillcount/model'))
os.chdir('../pytorchobjdet/vision')
class CountDataset(torch.utils.data.Dataset):
def __init__(self, root, data_file, transforms=None):
self.root = root
self.transforms = transforms
self.imgs = sorted(os.listdir(os.path.join(root, "count")))
self.path_to_data_file = data_file
def __getitem__(self, idx):
# load images and bounding boxes
img_path = os.path.join(self.root, "count", self.imgs[idx])
img = Image.open(img_path).convert("RGB")
box_list = parse_one_annot(self.path_to_data_file,
self.imgs[idx])
boxes = torch.as_tensor(box_list, dtype=torch.float32)
num_objs = len(box_list)
# there is only one class
labels = torch.ones((num_objs,), dtype=torch.int64)
image_id = torch.tensor([idx])
area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:,
0])
# suppose all instances are not crowd
iscrowd = torch.zeros((num_objs,), dtype=torch.int64)
target = {}
target["boxes"] = boxes
target["labels"] = labels
target["image_id"] = image_id
target["area"] = area
target["iscrowd"] = iscrowd
if self.transforms is not None:
img, target = self.transforms(img, target)
return img, target
def __len__(self):
return len(self.imgs)
dataset_count = CountDataset(root='../../Pill_Object_Detection',
data_file = "../../Pill_Object_Detection/count_labels.csv",
transforms = get_transform(train=False))
idx = 1
img, _ = dataset_count[idx]
#put the model in evaluation mode
loaded_model.eval()
with torch.no_grad():
prediction = loaded_model([img])
image = Image.fromarray(img.mul(255).permute(1, 2,0).byte().numpy())
draw = ImageDraw.Draw(image)
# draw groundtruth
count = 0
for element in range(len(prediction[0]["boxes"])):
boxes = prediction[0]["boxes"][element].cpu().numpy()
score = np.round(prediction[0]["scores"][element].cpu().numpy(),
decimals= 4)
if score > 0.6:
draw.rectangle([(boxes[0], boxes[1]), (boxes[2], boxes[3])],
outline ="red", width =3)
draw.text((boxes[0], boxes[1]), text = str(score))
count +=1
print(f'count = {count}')
image
The advice from the comment above was very helpful. I used the YOLO5vs model and it did an incredible job. This tutorial had a super easy set up that had you upload the annotated images into roboflow, and then it had some google colab tutorials set up for almost all of the current object detectors out there. Here is the result. I just need to give better quality training data but it did extremely well for the few pictures that I gave it. It can count well over 150 objects in the same image no problem.

How can a class label be extracted from the filename returned by tf.WholeFileReader.read() in TensorFlow?

I have a bunch of images organized in subdirectories, which correspond to the class labels of the image. E.g.
images/1/0000001.jpg, images/1/0000002.jpg, ... for class 1 images
images/2/0123456.jpg, images/2/0123457.jpg, ... for class 2 images
Now, I was wondering how I could get the integer class label when using the tf.WholeFileReader() which has the read method that yields the file name as a tensor. Outside the graph, I could simply do a int('images/2/0123457.jpg'.split('/')[1]) to get the integer label, but how do I do it inside the graph so that I can use the labels for model training? Below is a naive example, and I am basically looking for the solution to class_label = ... # get class label from file_name in this example below:
import tensorflow as tf
g = tf.Graph()
with g.as_default():
filename_queue = tf.train.string_input_producer(
tf.train.match_filenames_once('images/*/*.jpg'))
image_reader = tf.WholeFileReader()
file_name, image_raw = image_reader.read(filename_queue)
file_name = tf.identity(file_name, name='file_name')
image = tf.image.decode_jpeg(image_raw, name='image')
class_label = ... # get class label from file_name
with tf.Session(graph=g) as sess:
sess.run(tf.local_variables_initializer())
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
image_tensor = sess.run('image:0')
print('Image shape:', image_tensor.shape)
file_name = sess.run('file_name:0')
print('File name:', file_name)
coord.request_stop()
coord.join(threads)
Just found a solution to my question using tf.split_string and tf.string_to_number:
class_label = tf.string_split([file_name], '/').values[1]
class_label = tf.string_to_number(class_label, tf.int32)

Categories

Resources