I am looking to extract and identify digits from an image.
I've read a lot about digits recogntion but did not find anything on adding rules to select the only the digits we are interest in.
The rules would be "quite simple" I want to extract only the digits surrounded with a blue pen for example.
Not waiting for the entire solution here but more a axes of researches or links to similir problem.
I am quite familiar with Neural Networks and intend to use one on this. But I cannot see how filter out only the surrounded digits.
Here a sample of the picture. Image the same schema but several times on a picture.
I think you have three ways of operating. And maybe you do not need to get that far! For now, we will only look for which one has been selected.
Case 1: You can try to use the hough transform for circles to find the circles present in the image.
% Solution 1 (practically a perfect cicle, use hough circle transform to find circles)
im = imread('https://i.stack.imgur.com/L7cE1.png');
[centers, radii, metric] = imfindcircles(im, [10, 60]);
imshow(im); viscircles(centers, radii,'EdgeColor','r');
Case 2: You can work in the space of the blue color and eliminate achromatic colors to segment the areas that interest you (If you add margins you can work correctly).
% Solution 2 (ALWAYS is blue, read only rgB channel and delete achromatic)
b = im(:, :, 3) & (std(double(im(:, :, :)), [], 3) > 5);
bw = imfill(b,'holes');
stats = regionprops('table', bw, 'Centroid', 'MajorAxisLength','MinorAxisLength')
imshow(im); viscircles(stats.Centroid, stats.MajorAxisLength / 2,'EdgeColor','r');
Case 3: You can generate a dataset together with positive cases and others with negative ones. And train a neural network with 10 outputs that indicate in each one if there is or not crossed out (sigmoid output). The good thing about this type of model is that you should not do an OCR later.
import keras
from keras.layers import *
from keras.models import Model
from keras.losses import mean_squared_error
from keras.applications.mobilenet import MobileNet
def model():
WIDTH, HEIGHT = 128, 128
mobile_input = Input(shape=(WIDTH, HEIGHT, 3))
alpha = 0.25 # 0.25, 0.5, 1
shape = (1, 1, int(1024 * alpha))
dropout = 0.1
input_ = Input(shape=(WIDTH, HEIGHT, 3))
mobile_model = MobileNet(input_shape=(WIDTH, HEIGHT, 3),
alpha= alpha,
include_top=False,
dropout = dropout,
pooling='avg')
base_model = mobile_model(mobile_input)
x = Reshape(shape, name='reshape_1')(base_model)
x_gen = Dropout(dropout, name='dropout')(x)
x = Conv2D(10, (1, 1), padding='same')(x_gen)
x = Activation('sigmoid')(x)
output_detection = Reshape((10,), name='output_mark_detection')(x)
"""x = Conv2D(2 * 10, (1, 1), padding='same')(x_gen)
x = Activation('sigmoid')(x)
output_position = Reshape((2 * 10, ), name='output_mark_position')(x)
output = Concatenate(axis=-1)([output_detection, output_position])
"""
model = Model(name="mark_net", inputs=mobile_input, outputs=output_detection)
It depends on your problem, the first cases can serve you. In case of having different conditions of lighting, rotation, scaling, etc. I advise you to go directly to neural networks, you can create many "artificial" examples:
You can generate an artificial dataset by adding distorted
circles (take a normal circle apply random
affine transformations, add noise, change a little the blue color,
the line, etc).
Then you paste the randomly circle in each number and
generate the dataset indicating which numbers are marked.
Once "stuck on the paper" you can apply the data augmentation again
to make it look more real.
You can break the problem in two simpler sub-problems: you could train a first neural network to recognize circles and isolate them. Once you did, you can then train a second neural network to recognize the digits within the subsection you isolated. Hope this helps.
Related
Introduction
I would like to create a Region Proposal Network (RPN) using VGG16 and Kera (Python) framework. I am struggling to understand how to interpret the output of the RPN to predict bounding boxes of foreground objects.
Why does the RPN produce an array with 5x5 times the number of anchor boxes and how do I know which element corresponds to an anchor box?
# Below is some lovely pseudo-code
array_of_feature_maps = topless_vgg_model.predict(pre_processed_img)
print(array_of_feature_maps.shape)
>>> (1,7,7,52)
all_anchor_boxes = get_potential_boxes_for_region_proposal()
print(len(all_anchor_boxes))
>>> 784
predicted_scores_for_anchor_boxes, predicted_adjustments = rpn_model.predict(input_feature_map)
# 4 * 784 = 3136
print(f"Scores Shape = {predicted_scores_for_anchor_boxes.shape}, Adjustments (Deltas) Shape = {predicted_adjustments.shape}")
>>> Scores Shape = (1,5,5,784), Adjustments (Deltas) Shape = (1,5,5,3136)
Have I made a mistake when creating the RPN? Can I just chose element [0][0][0] and get the scores / deltas? My main resources which I am following are these
https://dongjk.github.io/code/object+detection/keras/2018/05/21/Faster_R-CNN_step_by_step,_Part_I.html
https://dongjk.github.io/code/object+detection/keras/2018/06/10/Faster_R-CNN_step_by_step,_Part_II.html
Here is the nitty gritty code
The main function is at the top under the config dictionary.
from keras import Model
from keras import models
from keras import optimizers
from keras import Sequential
from keras import layers
from keras import losses
from keras.preprocessing.image import ImageDataGenerator
from keras.optimizers import Adam
import keras.backend as K
import keras.applications
from keras import applications
from keras import utils
import cv2
import numpy as np
import os
import math
config = {
"ImgPath" : "1. Data Gen\\1. Data\\1X9A1712.jpg" #"Put your image path here"
,"VGG16InputSize" : (224,224)
,"AnchorBox" : {
"AspectRatioW_div_W" : [1/3,1/2,3/4,1]
,"Scales" : [1/2,3/4,1,3/2]
}
}
def main(): ############ MAIN FUNCTION - START HERE ############
# Get vgg model
vggmodel = applications.VGG16(include_top=False,weights='imagenet')
# Extract features for images (used dictionary comprehension to stop getting warning messages from Keras)
list_of_images = [cv2.imread(config["ImgPath"])]
array_of_prediction_ready_images = pre_process_image_for_vgg(list_of_images)
array_of_feature_maps = vggmodel.predict(array_of_prediction_ready_images)
# Find conversions from feature map (CNN output) to input image
feature_to_input_x_scale, feature_to_input_y_scale, feature_to_input_x_offset, feature_to_input_y_offset = find_feature_map_to_input_scale_and_offset(array_of_prediction_ready_images[0],array_of_feature_maps[0])
# get potential boxes, aka anchor boxes
potential_boxes = get_potential_boxes_for_region_proposal(array_of_prediction_ready_images[0],array_of_feature_maps[0],feature_to_input_x_scale, feature_to_input_y_scale, feature_to_input_x_offset, feature_to_input_y_offset)
# Create region proposal network
rpn_model = create_region_proposal_network(len(potential_boxes))
# Output following (height, width, anchor_num) (height, width, anchor_num * 4)
predicted_scores_for_anchor_boxes, predicted_adjustments = rpn_model.predict(array_of_feature_maps)
print(f"predicted_scores_for_anchor_boxes.shape = {predicted_scores_for_anchor_boxes.shape}, predicted_adjustments.shape = {predicted_adjustments.shape}")
print(f"But why is there the ,5,5, bit? I don't know which ones to choose now to get the predicted bounding box?")
def pre_process_image_for_vgg(img):
"""
Resizes the image to input of VGGInputSize specified in the config dictionary
Normalises the image
Reshapes the image to an array of images e.g. [[img],[img],..]
If img has a shape of
"""
if type(img) == np.ndarray: # Single image
resized_img = cv2.resize(img,config["VGG16InputSize"],interpolation = cv2.INTER_AREA)
normalised_image = applications.vgg16.preprocess_input(resized_img)
reshaped_to_array_of_images = np.array([normalised_image])
return reshaped_to_array_of_images
elif type(img) == list: # list of images
img_list = img
resized_img_list = [cv2.resize(image,config["VGG16InputSize"],interpolation = cv2.INTER_AREA) for image in img_list]
resized_img_array = np.array(resized_img_list)
normalised_images_array = applications.vgg16.preprocess_input(resized_img_array)
return normalised_images_array
def find_feature_map_to_input_scale_and_offset(pre_processed_input_image,feature_maps):
"""
Finds the scale and offset from the feature map (output) of the CNN classifier to the pre-processed input image of the CNN
"""
# Find shapes of feature maps and input images to the classifier CNN
input_image_shape = pre_processed_input_image.shape
feature_map_shape = feature_maps.shape
img_height, img_width, _ = input_image_shape
features_height, features_width, _ = feature_map_shape
# Find mapping from features map (output of vggmodel.predict) back to the input image
feature_to_input_x = img_width / features_width
feature_to_input_y = img_height / features_height
# Put anchor points in the centre of
feature_to_input_x_offset = feature_to_input_x/2
feature_to_input_y_offset = feature_to_input_y/2
return feature_to_input_x, feature_to_input_y, feature_to_input_x_offset, feature_to_input_y_offset
def get_get_coordinates_of_anchor_points(feature_map,feature_to_input_x,feature_to_input_y,x_offset,y_offset):
"""
Maps the CNN output (Feature map) coordinates on the input image to the CNN
Returns the coordinates as a list of dictionaries with the format {"x":x,"y":y}
"""
features_height, features_width, _ = feature_map.shape
# For the feature map (x,y) determine the anchors on the input image (x,y) as array
feature_to_input_coords_x = [int(x_feature*feature_to_input_x+x_offset) for x_feature in range(features_width)]
feature_to_input_coords_y = [int(y_feature*feature_to_input_y+y_offset) for y_feature in range(features_height)]
coordinate_of_anchor_points = [{"x":x,"y":y} for x in feature_to_input_coords_x for y in feature_to_input_coords_y]
return coordinate_of_anchor_points
def get_potential_boxes_for_region_proposal(pre_processed_input_image,feature_maps,feature_to_input_x, feature_to_input_y, x_offset, y_offset):
"""
Generates the anchor points (the centre of the enlarged feature map) as an (x,y) position on the input image
Generates all the potential bounding boxes for each anchor point
returns a list of potential bounding boxes in the form {"x1","y1","x2","y2"}
"""
# Find shapes of input images to the classifier CNN
input_image_shape = pre_processed_input_image.shape
# For the feature map (x,y) determine the anchors on the input image (x,y) as array
coordinate_of_anchor_boxes = get_get_coordinates_of_anchor_points(feature_maps,feature_to_input_x,feature_to_input_y,x_offset,y_offset)
# Create potential boxes for classification
boxes_width_height = generate_potential_box_dimensions(config["AnchorBox"],feature_to_input_x,feature_to_input_y)
list_of_potential_boxes_for_coords = [generate_potential_boxes_for_coord(boxes_width_height,coord) for coord in coordinate_of_anchor_boxes]
potential_boxes = [box for boxes_for_coord in list_of_potential_boxes_for_coords for box in boxes_for_coord]
return potential_boxes
def generate_potential_box_dimensions(settings,feature_to_input_x,feature_to_input_y):
"""
Generate potential boxes height & width for each point aka anchor boxes given the
ratio between feature map to input scaling for x and y
Assumption 1: Settings will have the following attributes
AspectRatioW_div_W: A list of float values representing the aspect ratios of
the anchor boxes at each location on the feature map
Scales: A list of float values representing the scale of the anchor boxes
at each location on the feature map.
"""
box_width_height = []
for scale in settings["Scales"]:
for aspect_ratio_w_div_h in settings["AspectRatioW_div_W"]:
width = round(feature_to_input_x*scale*aspect_ratio_w_div_h)
height = round(feature_to_input_y*scale/aspect_ratio_w_div_h)
box_width_height.append({"Width":width,"Height":height})
return box_width_height
def generate_potential_boxes_for_coord(box_width_height,coord):
"""
Assumption 1: box_width_height is an array of dictionary with each dictionary consisting of
{"Width":positive integer, "Height": positive integer}
Assumption 2: coord is an array of dictionary with each dictionary consistening of
{"x":centre of box x coordinate,"y",centre of box y coordinate"}
"""
potential_boxes = []
for box_dim in box_width_height:
potential_boxes.append({
"x1": coord["x"]-int(box_dim["Width"]/2)
,"y1": coord["y"]-int(box_dim["Height"]/2)
,"x2": coord["x"]+int(box_dim["Width"]/2)
,"y2": coord["y"]+int(box_dim["Height"]/2)
})
return potential_boxes
def create_region_proposal_network(number_of_potential_bounding_boxes,number_of_feature_map_channels=512):
"""
Creates the region proposal network which takes the input of the feature map and
Compiles the model and returns it
RPN consists of an input later, a CNN and two output layers.
output_deltas:
output_scores:
Note: Number of feature map channels should be the last element of model.predict().shape
"""
# Input layer
feature_map_tile = layers.Input(shape=(None,None,number_of_feature_map_channels),name="RPN_Input_Same")
# CNN component
convolution_3x3 = layers.Conv2D(filters=512,kernel_size=(3, 3),name="3x3")(feature_map_tile)
# Output layers
output_deltas = layers.Conv2D(filters= 4 * number_of_potential_bounding_boxes,kernel_size=(1, 1),activation="linear",kernel_initializer="uniform",name="Output_Deltas")(convolution_3x3)
output_scores = layers.Conv2D(filters=1 * number_of_potential_bounding_boxes,kernel_size=(1, 1),activation="sigmoid",kernel_initializer="uniform",name="Output_Prob_FG")(convolution_3x3)
model = Model(inputs=[feature_map_tile], outputs=[output_scores, output_deltas])
# TODO add loss_cls and smoothL1
model.compile(optimizer='adam', loss={'scores1':losses.binary_crossentropy, 'deltas1':losses.huber})
return model
if __name__ == "__main__":
main()
My Research So Far
[Step by step explanation of RPN + extra] - https://dongjk.github.io/code/object+detection/keras/2018/05/21/Faster_R-CNN_step_by_step,_Part_I.html
[vgg with top=false will only output the feature maps which is (7,7,512), other solutions will have different features produced] - https://github.com/keras-team/keras/issues/4465
[Understanding anchor boxes] - https://machinelearningmastery.com/padding-and-stride-for-convolutional-neural-networks/
[Faster RCNN - how they calculate stride] - https://stats.stackexchange.com/questions/314823/how-is-the-stride-calculated-in-the-faster-rcnn-paper
[Good article on Faster RCNN explained] - https://medium.com/#smallfishbigsea/faster-r-cnn-explained-864d4fb7e3f8
[Indicating that Anchor boxes should be determine by ratio and scale ratio should be width:height of 1:2 1:1 2:1 scale should be 1 1/2 1/3] - https://keras.io/examples/vision/retinanet/
[Best explanation of anchor boxes] - https://www.mathworks.com/help/vision/ug/anchor-boxes-for-object-detection.html#:~:text=Anchor%20boxes%20are%20a%20set,sizes%20in%20your%20training%20datasets
[Summary of object detection history, interesting read] - https://dudeperf3ct.github.io/object/detection/2019/01/07/Mystery-of-Object-Detection/
[Mask RCNN Jupyter Notebook] - https://github.com/matterport/Mask_RCNN/blob/master/samples/coco/inspect_model.ipynb
[RPN in Python Keras which i'm trying to understand] - https://github.com/dongjk/faster_rcnn_keras/blob/master/RPN.py
[RPN implementation Keras Python] - https://github.com/you359/Keras-FasterRCNN/blob/master/keras_frcnn/data_generators.py
[RPN implementation Well Commented] - https://github.com/virgil81188/Region-Proposal-Network/tree/03025cde75c1d634b608c277e6aa40ccdb829693
[RPN Loss function clearly explained] - https://www.geeksforgeeks.org/faster-r-cnn-ml/
[RPN Developed in Keras Python Framework] - https://github.com/alexmagsam/keras-rpn
wow you made it all the way to the bottom, hope you had a good read!
Keras/Tensorflow use the BHWC convention for Tensor shapes (also called "channel-last"). Looking at the output shape of your VGG model, which is (1, 7, 7, 52), this means that the spatial grid is of size 7x7 and there are 52 channels. The RPN you defined output a tensor of shape (1, 5, 5, 784), which you guessed right has a lower spatial resolution as the VGG network.
From your RPN code, the explanation is simple: you used a Conv2D with a kernel size of 3x3 and default value for padding, which is 'valid'. This means that the output spatial extend will be smaller than the input one because the convolution only occurs at "valid" locations, i.e where the kernel fits inside the input tensor.
padding='same' will solve this issue and you'll have a tensor of shape (1, 7, 7, 784).
The formula to to compute the output tensor shape wrt the input tensor shape and convolution parameter is given by (channel-first, its is PyTorch documentation):
where padding, dilation, kernel_size and stride are tuples of (int, int) corresponding to values for the height dimension and the width one.
I was confusing anchor boxes and anchor points.
Anchor Points: Relates to the coordinates on a feature map (output of the backbone classifier)
Anchor boxes: The scale and aspect ratio of each of the anchor points
This mistake that I was making in the below code that instead of using number_of_potential_bounding_boxes which was 784 which is feature_width*feature_height*scale_boxes*aspect_ratios. Instead it should be 16 as scale_boxes*aspect_ratios
# Output layers
output_deltas = layers.Conv2D(filters= 4 * number_of_potential_bounding_boxes,kernel_size=(1, 1),activation="linear",kernel_initializer="uniform",name="Output_Deltas")(convolution_3x3)
output_scores = layers.Conv2D(filters=1 * number_of_potential_bounding_boxes,kernel_size=(1, 1),activation="sigmoid",kernel_initializer="uniform",name="Output_Prob_FG")(convolution_3x3)
So the output should be [:,7,7,16]
Which is the following
[number of images to be predicted, height of feature map, width of feature map, number of anchor boxes per anchor point scales*aspect ratios]
I have implemented two simple RPN without regression below
https://github.com/alexshellabear/Simple-Region-Proposal-Network
https://github.com/alexshellabear/Still-Simple-Region-Proposal-Network
I'm using the following example to analyse the performance of Computer Vision system depending on the data quality.
Keras Implementation Retinanet: https://keras.io/examples/vision/retinanet/
My goal is to corrupt(stretch, shift) certain percentages (10%,20%,30%) of the total bounding boxes across all images. This means that images should be randomly picked and them some of the bounding boxes corrupted so that in total the target percentage is affected.
I'm using the tensorflow datasets as my training data (e.g. https://www.tensorflow.org/datasets/catalog/kitti).
My basic idea was to generate an array in the size of the total amout of boxes and fill it with 1 (modify box) and 0 (ignore box) and then iterate through all boxes:
random_array = np.concatenate((np.ones(int(error_rate_size*TOTAL_NUMBER_OF_BOXES)+1,dtype=int),np.zeros(int((1-error_rate_size)*TOTAL_NUMBER_OF_BOXES)+1,dtype=int)))
The problem is that the implementation I'm using is heavily relying on graph implementation and specifially on the map function (https://www.tensorflow.org/api_docs/python/tf/data/Dataset#map). I would like to follow this pattern in order to keep the implemented data pipeline.
What I am hopeing to do is to use map function in combination with a global counter so I can loop through the array and modify whenever a condition is given. It should roughly look something like this:
COUNT = 0
def damage_data(box):
scaling_range = 2.0
global COUNT
COUNT += 1
if random_array[COUNT]== 1:
new_box = tf.stack(
[
box[0]*scaling_range*tf.random.uniform(shape=(),minval=0.0,maxval=1.0,dtype=tf.float32,seed=1), # x center
box[1]*scaling_range*tf.random.uniform(shape=(),minval=0.0,maxval=1.0,dtype=tf.float32,seed=2), # y center
box[2]*scaling_range*tf.random.uniform(shape=(),minval=0.0,maxval=1.0,dtype=tf.float32,seed=3), # width,
box[3]*scaling_range*tf.random.uniform(shape=(),minval=0.0,maxval=1.0,dtype=tf.float32,seed=4), # height,
],
axis=-1,)
else:
tf.print("Not Changed")
new_box = tf.stack(
[
box[0],
box[1], # y center
box[2], # width,
box[3], # height,
],
axis=-1,)
return new_box
def damage_data_cross_sequential(image, bbox, class_id):
# bbox format [x_center, y_center, width, height]
bbox = tf.map_fn(damage_data,bbox)
return image, bbox, class_id
train_dataset = train_dataset.map(damage_data_cross_sequential,num_parallel_calls=1)
But using this code the variable COUNT is not incremented globally but rather every map() call starts from the initial value 0. I assume this somehow is caused through the graph implementation and the parallel processes in map().
The question is now if there is any way to globally increase a counter through the map function or if I could extend the given dataset with a unique identifier (e.g. add box[5] = id).
I hope the problem is clear and thanks already! :)
--------------UPDATE 1-------------------------------
The second approach as described by #Lescurel is what I'm trying to do.
Some clarifications about the dataset structure.
The number of boxes per image is not identical.It changes from image to image.
e.g. sample 1: ((x_dim, y_dim, 3), (4,4)), sample 2: ((x_dim, y_dim, 3), (2,4))
For a better understanding the structure can be reproduced with the following:
import tensorflow as tf
import tensorflow_datasets as tfds
import numpy as np
valid_ds = tfds.load('kitti', split='validation') # validation is a smaller set
def select_relevant_info(sample):
image = sample["image"]
bbox = sample["objects"]["bbox"]
class_id = tf.cast(sample["objects"]["type"], dtype=tf.int32)
return image, bbox, class_id
valid_ds = valid_ds.map(select_relevant_info)
for sample in valid_ds.take(1):
print(sample)
For plenty of reasons, using a global state is not a terribly good idea, but it's probably even worse in a concurrent context like this one.
There is at least two other ways of implementing what you want:
using a random sample with a threshold as condition to modify the label
put your random array in the dataset as the condition to modify the label.
I personally prefer the first option, which is simpler.
An example.
Lets generate some random data, and create a tf.Dataset. In that example, the total number of sample is 1000:
imgs = tf.random.uniform((1000, 4, 4))
boxes = tf.ones((1000, 4))
ds = tf.data.Dataset.from_tensor_slices((imgs, boxes))
First option: Random Sample
This function will draw a number uniformly between 0 and 1. If this number is higher than the threshold prob, then nothing happens. Otherwise, we modify the label. In that example, it gives a 0.05% chance of modifying the label.
def change_label_with_prob(label, prob=0.05, scaling_range=2.):
return tf.cond(
tf.random.uniform(()) > prob,
lambda: label,
lambda: label*scaling_range*tf.random.uniform((4,), 0., 1., dtype=tf.float32),
)
You can simply call it with Dataset.map:
new_ds = ds.map(lambda img, box: (img, change_label_with_prob(box)))
Second Option : Pass the condition array around
First, we generate an array filled with our conditions: 1 if we want to modify the array, 0 if not.
# lets set the number to change to 200
N_TO_CHANGE = 200
# randomly generated array with 200 "1" and "800" 0.
cond_array = tf.random.shuffle(
tf.concat([tf.ones((N_TO_CHANGE,),dtype=tf.bool), tf.zeros((1000 - N_TO_CHANGE,),dtype=tf.bool)], axis=0)
)
Then we can create a dataset from that array of conditions, and zip it with our previous dataset:
# creating a dataset from the conditional array
ds_cond = tf.data.Dataset.from_tensor_slices(cond_array)
# zipping the two datasets together
ds_data_and_cond = tf.data.Dataset.zip((ds, ds_cond))
# each element of that dataset is ((img, box), cond)
We can write our function, roughly the same as before:
def change_label_with_cond(label, cond, scaling_range=2.0):
# if true, modifies, do nothing otherwise
return tf.cond(
cond,
lambda: label
* scaling_range
* tf.random.uniform((4,), 0.0, 1.0, dtype=tf.float32),
lambda: label,
)
And then map the function on our new dataset, paying attention to the nested shape of each element of the dataset:
ds_changed_label = ds_data_and_cond.map(
lambda img_and_box, z: (img_and_box[0], change_label_with_cond(img_and_box[1], z))
)
# New dataset has a shape (img, box), same as before the zipping
I'm having an issue with useful detection using Python, OpenCV 3.1 and HOG. While I have working code that executes without error, the trained HOG/SVM combination fails to detect on test images.
From OpenCV examples and other Stack Overflow discussions I've developed the following approach.
win_size = (64, 64)
block_size = (16, 16)
block_stride = (8, 8)
cell_size = (8, 8)
nbins = 9
deriv_aperture = 1
win_sigma = 4.
histogram_norm_type = 0
l2_hys_threshold = 2.0000000000000001e-01
gamma_correction = 0
nlevels = 64
hog = cv2.HOGDescriptor(win_size,
block_size,
block_stride,
cell_size,
nbins,
deriv_aperture,
win_sigma,
histogram_norm_type,
l2_hys_threshold,
gamma_correction,
nlevels)
window_stride = (8, 8)
padding = (8, 8)
locations = ((0, 0),)
histograms = []
# not showing the loop here but
# create histograms for 600 positive and 600 negative images
# all images are of size 64x64
histograms.append(np.transpose(hog.compute(roi, window_stride, padding, locations)))
training_data = np.concatenate(histograms)
classifications = np.array([1] * 600 + [0] * 600)
svm = cv2.ml.SVM_create()
svm.setType(cv2.ml.SVM_C_SVC)
svm.setKernel(cv2.ml.SVM_LINEAR)
svm.setC(0.01)
svm.setTermCriteria((cv2.TermCriteria_MAX_ITER, 100, 1e-6))
svm.train(training_data, cv2.ml.ROW_SAMPLE, classifications)
# testing
test_img = cv2.imread('test_image.jpg')
svmvec = svm.getSupportVectors()[0]
rho = -svm.getDecisionFunction(0)[0]
svmvec = np.append(svmvec, rho)
hog.setSVMDetector(svmvec)
found, w = hog.detectMultiScale(test_img)
In every test, found is a single rectangle centered in the image and is not located where the positive is located in the test image.
I've tried many different combinations of parameters based on Stack Overflow answers and other OpenCV samples and discussions. None of them change the results.
I think that you need all support vectors you have. So the problem is not your training code, it is your test.
svm.train(training_data, cv2.ml.ROW_SAMPLE, classifications)
You do your training with all data you have but when comes to testing, you only use a small part of your resulting classifier.
svmvec = svm.getSupportVectors()[0]
Change this line and you'll have one less problem.
The reason why single rectangle is created at the center is because the detector classified almost all region as "human".
By default, detectMultiScale suppress the overlap of the rectangles. So you can only see the single rectangle at the center.
You can turn off this suppression with finalThreshold option of detectMultiScale.
hogParams = { 'finalThreshold': 0}
found, w = hog.detectMultiScale(test_img, **hogParams)
By default, this parameter is set to 2.
You can see almost all regions are filled by the rectangle color.
My answer to this "misclassification" is simple change of the order of the labels.
classifications = np.array([0] * 600 + [1] * 600)
I need to use Gaussian Mixture Models on an RGB image, and therefore the dataset is quite big. This needs to run on real time (from a webcam feed). I first coded this with Matlab and I was able to achieve a running time of 0.5 seconds for an image of 1729 × 866. The images for the final application will be smaller and therefore the timing will be faster.
However, I need to implement this with Python and OpenCV for the final application (I need it to run on an embedded board). I translated all my code and used sklearn.mixture.GMM to replace fitgmdist in Matlab. The line of code calculating the GMM model itself is performed in only 7.7e-05 seconds, but the one to fit the model takes 19 seconds. I have tried other types of covariance, such as 'diag' or 'spherical', and the time does reduce a little but the results are worse and the time is still not good enough, not even close.
I was wondering if there is any other library I can use, or if it would be worth it to translate the functions from Matlab to Python.
Here is my example:
import cv2
import numpy as np
import math
from sklearn.mixture import GMM
im = cv2.imread('Boat.jpg');
h, w, _ = im.shape; # Height and width of the image
# Extract Blue, Green and Red
imB = im[:,:,0]; imG = im[:,:,1]; imR = im[:,:,2];
# Reshape Blue, Green and Red channels into single-row vectors
imB_V = np.reshape(imB, [1, h * w]);
imG_V = np.reshape(imG, [1, h * w]);
imR_V = np.reshape(imR, [1, h * w]);
# Combine the 3 single-row vectors into a 3-row matrix
im_V = np.vstack((imR_V, imG_V, imB_V));
# Calculate the bimodal GMM
nmodes = 2;
GMModel = GMM(n_components = nmodes, covariance_type = 'full', verbose = 0, tol = 1e-3)
GMModel = GMModel.fit(np.transpose(im_V))
Thank you very much for your help
You can try fit with the 'diagonal' or spherical covariance matrix instead of full.
covariance_type='diag'
or
covariance_type='spherical'
I believe it will be much faster.
I already achieved the goal described in the title but I was wondering if there was a more efficient (or generally better) way to do it. First of all let me introduce the problem.
I have a set of images of different sizes but with a width/height ratio less than (or equal) 2 (could be anything but let's say 2 for now), I want to normalize each one, meaning I want all of them to have the same size. Specifically I am going to do so like this:
Extract the max height above all images
Zoom the image so that each image reaches the max height keeping its ratio
Add a padding to the right with just white pixels until the image has a width/height ratio of 2
Keep in mind the images are represented as numpy matrices of grey scale values [0,255].
This is how I'm doing it now in Python:
max_height = numpy.max([len(obs) for obs in data if len(obs[0])/len(obs) <= 2])
for obs in data:
if len(obs[0])/len(obs) <= 2:
new_img = ndimage.zoom(obs, round(max_height/len(obs), 2), order=3)
missing_cols = max_height * 2 - len(new_img[0])
norm_img = []
for row in new_img:
norm_img.append(np.pad(row, (0, missing_cols), mode='constant', constant_values=255))
norm_img = np.resize(norm_img, (max_height, max_height*2))
There's a note about this code:
I'm rounding the zoom ratio because it makes the final height equal to max_height, I'm sure this is not the best approach but it's working (any suggestion is appreciated here). What I'd like to do is to expand the image keeping the ratio until it reaches a height equal to max_height. This is the only solution I found so far and it worked right away, the interpolation works pretty good.
So my final questions are:
Is there a better approach to achieve what explained above (image normalization) ? Do you think I could have done this differently ? Is there a common good practice I'm not following ?
Thanks in advance for your time.
Instead of ndimage.zoom you could use
scipy.misc.imresize. This
function allows you to specify the target size as a tuple, instead of by zoom
factor. Thus you won't have to call np.resize later to get the size exactly as
desired.
Note that scipy.misc.imresize calls
PIL.Image.resize
under the hood, so PIL (or Pillow) is a dependency.
Instead of using np.pad in a for-loop, you could allocate space for the desired array, norm_arr, first:
norm_arr = np.full((max_height, max_width), fill_value=255)
and then copy the resized image, new_arr into norm_arr:
nh, nw = new_arr.shape
norm_arr[:nh, :nw] = new_arr
For example,
from __future__ import division
import numpy as np
from scipy import misc
data = [np.linspace(255, 0, i*10).reshape(i,10)
for i in range(5, 100, 11)]
max_height = np.max([len(obs) for obs in data if len(obs[0])/len(obs) <= 2])
max_width = 2*max_height
result = []
for obs in data:
norm_arr = obs
h, w = obs.shape
if float(w)/h <= 2:
scale_factor = max_height/float(h)
target_size = (max_height, int(round(w*scale_factor)))
new_arr = misc.imresize(obs, target_size, interp='bicubic')
norm_arr = np.full((max_height, max_width), fill_value=255)
# check the shapes
# print(obs.shape, new_arr.shape, norm_arr.shape)
nh, nw = new_arr.shape
norm_arr[:nh, :nw] = new_arr
result.append(norm_arr)
# visually check the result
# misc.toimage(norm_arr).show()