HOG training and detection in Python using OpenCV

HOG training and detection in Python using OpenCV - python

I'm having an issue with useful detection using Python, OpenCV 3.1 and HOG. While I have working code that executes without error, the trained HOG/SVM combination fails to detect on test images.
From OpenCV examples and other Stack Overflow discussions I've developed the following approach.
win_size = (64, 64)
block_size = (16, 16)
block_stride = (8, 8)
cell_size = (8, 8)
nbins = 9
deriv_aperture = 1
win_sigma = 4.
histogram_norm_type = 0
l2_hys_threshold = 2.0000000000000001e-01
gamma_correction = 0
nlevels = 64
hog = cv2.HOGDescriptor(win_size,
block_size,
block_stride,
cell_size,
nbins,
deriv_aperture,
win_sigma,
histogram_norm_type,
l2_hys_threshold,
gamma_correction,
nlevels)
window_stride = (8, 8)
padding = (8, 8)
locations = ((0, 0),)
histograms = []
# not showing the loop here but
# create histograms for 600 positive and 600 negative images
# all images are of size 64x64
histograms.append(np.transpose(hog.compute(roi, window_stride, padding, locations)))
training_data = np.concatenate(histograms)
classifications = np.array([1] * 600 + [0] * 600)
svm = cv2.ml.SVM_create()
svm.setType(cv2.ml.SVM_C_SVC)
svm.setKernel(cv2.ml.SVM_LINEAR)
svm.setC(0.01)
svm.setTermCriteria((cv2.TermCriteria_MAX_ITER, 100, 1e-6))
svm.train(training_data, cv2.ml.ROW_SAMPLE, classifications)
# testing
test_img = cv2.imread('test_image.jpg')
svmvec = svm.getSupportVectors()[0]
rho = -svm.getDecisionFunction(0)[0]
svmvec = np.append(svmvec, rho)
hog.setSVMDetector(svmvec)
found, w = hog.detectMultiScale(test_img)
In every test, found is a single rectangle centered in the image and is not located where the positive is located in the test image.
I've tried many different combinations of parameters based on Stack Overflow answers and other OpenCV samples and discussions. None of them change the results.

I think that you need all support vectors you have. So the problem is not your training code, it is your test.
svm.train(training_data, cv2.ml.ROW_SAMPLE, classifications)
You do your training with all data you have but when comes to testing, you only use a small part of your resulting classifier.
svmvec = svm.getSupportVectors()[0]
Change this line and you'll have one less problem.

The reason why single rectangle is created at the center is because the detector classified almost all region as "human".
By default, detectMultiScale suppress the overlap of the rectangles. So you can only see the single rectangle at the center.
You can turn off this suppression with finalThreshold option of detectMultiScale.
hogParams = { 'finalThreshold': 0}
found, w = hog.detectMultiScale(test_img, **hogParams)
By default, this parameter is set to 2.
You can see almost all regions are filled by the rectangle color.
My answer to this "misclassification" is simple change of the order of the labels.
classifications = np.array([0] * 600 + [1] * 600)

Related

Best method to add noise on tf.dataset images "on the fly"

After training a model (image classification) I would like to see how it performs differently when I evaluate a proper image and various noised versions of it.
The type of noise I'm thinking is a random change in pixels value, I tried with this approach:
# --Inside the generator function that I provide to model.predict_generator--
# dataset is a numpy array with denoised images path
dt = tf.data.Dataset.from_generator(lambda: image_generator(dataset), output_types=(tf.float32))
def image_generator_(image_paths):
for path in image_paths:
# im is keras.preprocessing image
img = im.load_img(path,
color_mode='rgb',
target_size=(224,224))
img_to_numpy = np.array(img)
for _ in range (0, 5):
tmp_numpy_image = img_to_numpy.copy()
for i in range(tmp_numpy_image.shape[0]):
for j in range(tmp_numpy_image.shape[1]):
# add noise
tmp_numpy_image.shape[i][j] = ...
yield tmp_numpy_image
This process works fine but it is very slow. I also use dataset.batch and dataset.prefetch on dt and I didn't found a combination for their values that reduces the algorithm time
Is there a smarter way to do it? I tried by yielding not noised images and to add the noise later inside dataset.map. The problem is that inside map I have to manipulate tensors and I didn't found a way to change each pixel value
SOLUTION
I used #Marat approach and it worked like a charm, the whole process went from 20-30 hours to minutes. My noise was a simple +-1 but I didn't want to go in overflow (255+1 = 0 in uint8) and therefore I only had to use numpy masks
...
tmp_numpy_image = img_to_numpy.copy()
noise = np.random.randint(-1, 1, img_to_numpy.shape)
# tmp_numpy_imag will become of type int32
tmp_numpy_image = tmp_numpy_image + noise
np.putmask(tmp_numpy_image, tmp_numpy_image < 0, 0)
np.putmask(tmp_numpy_image, tmp_numpy_image > 255, 255)
tmp_numpy_image = tmp_numpy_image.astype('uint8')
yield tmp_numpy_image

The biggest overhead here is pixel operations (double for loop). Vectorizing it should result in substantial speedup:
noise_magnitude = 10
...
img_max_value = img_to_numpy.max() * np.ones(img_to_numpy.shape)
for _ in range (0, 5):
# depending on range of values, you might want to adjust noise magnitude
noise = np.random.randint(0, noise_magnitude, img_to_numpy.shape)
# after adding noise, clip values exceeding max values
yield np.maximum(img_to_numpy + noise, img_max_value)

Digits recognition with rules

I am looking to extract and identify digits from an image.
I've read a lot about digits recogntion but did not find anything on adding rules to select the only the digits we are interest in.
The rules would be "quite simple" I want to extract only the digits surrounded with a blue pen for example.
Not waiting for the entire solution here but more a axes of researches or links to similir problem.
I am quite familiar with Neural Networks and intend to use one on this. But I cannot see how filter out only the surrounded digits.
Here a sample of the picture. Image the same schema but several times on a picture.

I think you have three ways of operating. And maybe you do not need to get that far! For now, we will only look for which one has been selected.
Case 1: You can try to use the hough transform for circles to find the circles present in the image.
% Solution 1 (practically a perfect cicle, use hough circle transform to find circles)
im = imread('https://i.stack.imgur.com/L7cE1.png');
[centers, radii, metric] = imfindcircles(im, [10, 60]);
imshow(im); viscircles(centers, radii,'EdgeColor','r');
Case 2: You can work in the space of the blue color and eliminate achromatic colors to segment the areas that interest you (If you add margins you can work correctly).
% Solution 2 (ALWAYS is blue, read only rgB channel and delete achromatic)
b = im(:, :, 3) & (std(double(im(:, :, :)), [], 3) > 5);
bw = imfill(b,'holes');
stats = regionprops('table', bw, 'Centroid', 'MajorAxisLength','MinorAxisLength')
imshow(im); viscircles(stats.Centroid, stats.MajorAxisLength / 2,'EdgeColor','r');
Case 3: You can generate a dataset together with positive cases and others with negative ones. And train a neural network with 10 outputs that indicate in each one if there is or not crossed out (sigmoid output). The good thing about this type of model is that you should not do an OCR later.
import keras
from keras.layers import *
from keras.models import Model
from keras.losses import mean_squared_error
from keras.applications.mobilenet import MobileNet
def model():
WIDTH, HEIGHT = 128, 128
mobile_input = Input(shape=(WIDTH, HEIGHT, 3))
alpha = 0.25 # 0.25, 0.5, 1
shape = (1, 1, int(1024 * alpha))
dropout = 0.1
input_ = Input(shape=(WIDTH, HEIGHT, 3))
mobile_model = MobileNet(input_shape=(WIDTH, HEIGHT, 3),
alpha= alpha,
include_top=False,
dropout = dropout,
pooling='avg')
base_model = mobile_model(mobile_input)
x = Reshape(shape, name='reshape_1')(base_model)
x_gen = Dropout(dropout, name='dropout')(x)
x = Conv2D(10, (1, 1), padding='same')(x_gen)
x = Activation('sigmoid')(x)
output_detection = Reshape((10,), name='output_mark_detection')(x)
"""x = Conv2D(2 * 10, (1, 1), padding='same')(x_gen)
x = Activation('sigmoid')(x)
output_position = Reshape((2 * 10, ), name='output_mark_position')(x)
output = Concatenate(axis=-1)([output_detection, output_position])
"""
model = Model(name="mark_net", inputs=mobile_input, outputs=output_detection)
It depends on your problem, the first cases can serve you. In case of having different conditions of lighting, rotation, scaling, etc. I advise you to go directly to neural networks, you can create many "artificial" examples:
You can generate an artificial dataset by adding distorted
circles (take a normal circle apply random
affine transformations, add noise, change a little the blue color,
the line, etc).
Then you paste the randomly circle in each number and
generate the dataset indicating which numbers are marked.
Once "stuck on the paper" you can apply the data augmentation again
to make it look more real.

You can break the problem in two simpler sub-problems: you could train a first neural network to recognize circles and isolate them. Once you did, you can then train a second neural network to recognize the digits within the subsection you isolated. Hope this helps.

How to reshape numpy image array for color channel

I am working on a captcha recognition project with Keras library. For training set, I am using the following function to generate at most 5 digit captchas.
def genData(n=1000, max_digs=5, width=60):
capgen = ImageCaptcha()
data = []
target = []
for i in range(n):
x = np.random.randint(0, 10 ** max_digs)
img = misc.imread(capgen.generate(str(x)))
img = np.mean(img, axis=2)[:, :width]
data.append(img.flatten())
target.append(x)
return np.array(data), np.array(target)
Then, I am trying to reshape training data array like following;
train_data = train_data.reshape(train_data.shape[0], 60, 60, 3)
I guess my captchas have 3 color channel. However, when I tried to reshape the training data I am facing with the following error;
ValueError: cannot reshape array of size 3600000 into shape
(1000,60,60,3)
Note: If I try with 1 instead of 3. the error is not occurring but my accuracy is not even close to %1

You are creating a single channel image by taking the mean. The error says that you are trying to reshape an array with 3600000 elements in an array three times as big (1000*60*60*3 = 10800000). Adapt your function the example below to get it to work.
Also, because you are decreasing the width of your image to 60 pixels the target is not correct anymore. This explains the low accuracy. Try using a bigger width and your accuracy will most likely increase (e.g 150-155).
def genData(n=1000, max_digs=5, width=60):
capgen = ImageCaptcha()
data = []
target = []
for i in range(n):
x = np.random.randint(0, 10 ** max_digs)
img = misc.imread(capgen.generate(str(x)))
img = img[:,:width,:]
data.append(img.flatten())
target.append(x)
return np.array(data), np.array(target)

Better image normalization with numpy

I already achieved the goal described in the title but I was wondering if there was a more efficient (or generally better) way to do it. First of all let me introduce the problem.
I have a set of images of different sizes but with a width/height ratio less than (or equal) 2 (could be anything but let's say 2 for now), I want to normalize each one, meaning I want all of them to have the same size. Specifically I am going to do so like this:
Extract the max height above all images
Zoom the image so that each image reaches the max height keeping its ratio
Add a padding to the right with just white pixels until the image has a width/height ratio of 2
Keep in mind the images are represented as numpy matrices of grey scale values [0,255].
This is how I'm doing it now in Python:
max_height = numpy.max([len(obs) for obs in data if len(obs[0])/len(obs) <= 2])
for obs in data:
if len(obs[0])/len(obs) <= 2:
new_img = ndimage.zoom(obs, round(max_height/len(obs), 2), order=3)
missing_cols = max_height * 2 - len(new_img[0])
norm_img = []
for row in new_img:
norm_img.append(np.pad(row, (0, missing_cols), mode='constant', constant_values=255))
norm_img = np.resize(norm_img, (max_height, max_height*2))
There's a note about this code:
I'm rounding the zoom ratio because it makes the final height equal to max_height, I'm sure this is not the best approach but it's working (any suggestion is appreciated here). What I'd like to do is to expand the image keeping the ratio until it reaches a height equal to max_height. This is the only solution I found so far and it worked right away, the interpolation works pretty good.
So my final questions are:
Is there a better approach to achieve what explained above (image normalization) ? Do you think I could have done this differently ? Is there a common good practice I'm not following ?
Thanks in advance for your time.

Instead of ndimage.zoom you could use
scipy.misc.imresize. This
function allows you to specify the target size as a tuple, instead of by zoom
factor. Thus you won't have to call np.resize later to get the size exactly as
desired.
Note that scipy.misc.imresize calls
PIL.Image.resize
under the hood, so PIL (or Pillow) is a dependency.
Instead of using np.pad in a for-loop, you could allocate space for the desired array, norm_arr, first:
norm_arr = np.full((max_height, max_width), fill_value=255)
and then copy the resized image, new_arr into norm_arr:
nh, nw = new_arr.shape
norm_arr[:nh, :nw] = new_arr
For example,
from __future__ import division
import numpy as np
from scipy import misc
data = [np.linspace(255, 0, i*10).reshape(i,10)
for i in range(5, 100, 11)]
max_height = np.max([len(obs) for obs in data if len(obs[0])/len(obs) <= 2])
max_width = 2*max_height
result = []
for obs in data:
norm_arr = obs
h, w = obs.shape
if float(w)/h <= 2:
scale_factor = max_height/float(h)
target_size = (max_height, int(round(w*scale_factor)))
new_arr = misc.imresize(obs, target_size, interp='bicubic')
norm_arr = np.full((max_height, max_width), fill_value=255)
# check the shapes
# print(obs.shape, new_arr.shape, norm_arr.shape)
nh, nw = new_arr.shape
norm_arr[:nh, :nw] = new_arr
result.append(norm_arr)
# visually check the result
# misc.toimage(norm_arr).show()

python memory intensive script

After about 4 weeks of learning, experimenting, etc. I finally have a script which does what I want. It changes the perspective of images according to a certain projection matrix I have created. When I run the script for one image it works fine, however I would like to plot six images in one figure. When I try to do this I get a memory error.
All the images are 2448px in width and 2048 px in height each. My script:
files = {'cam1': 'c1.jpg',
'cam2': 'c2.jpg',
'cam3': 'c3.jpg',
'cam4': 'c4.jpg',
'cam5': 'c5.jpg',
'cam6': 'c6.jpg'}
fig, ax = plt.subplots()
for camname in files:
img = Image.open(files[camname])
gray_img = np.asarray(img.convert("L"))
img = np.asarray(img)
height, width, channels = img.shape
usedP = np.array(P[camname][:,[0,1,3]])
usedPinv = np.linalg.inv(usedP)
U, V = np.meshgrid(range(gray_img.shape[1]),
range(gray_img.shape[0]))
UV = np.vstack((U.flatten(),
V.flatten())).T
ones = np.ones((UV.shape[0],1))
UV = np.hstack((UV, ones))
# create UV_warped
UV_warped = usedPinv.dot(UV.T).T
# normalize vector by dividing by the third column (which should be 1)
normalize_vector = UV_warped[:,2].T
UV_warped = UV_warped/normalize_vector[:,None]
# masks
# pixels that are above the horizon and where the V-projection is therefor positive (X in argus): make 0, 0, 1
# pixels that are to far: make 0,0,1
masks = [UV_warped[:,0]<=0, UV_warped[:,0]>2000, UV_warped[:,1]>5000, UV_warped[:,1]<-5000] # above horizon: => [0,0,1]
total_mask = masks[0] | masks[1] | masks[2] | masks[3]
UV_warped[total_mask] = np.array([[0.0, 0.0, 1.0]])
# show plot
X_warped = UV_warped[:,0].reshape((height, width))
Y_warped = UV_warped[:,1].reshape((height, width))
gray_img = gray_img[:-1, :-1]
# add colors
rgb = img[:,:-1,:].reshape((-1,3)) / 255.0 # we have 1 less faces than grid cells
rgba = np.concatenate((rgb, np.ones((rgb.shape[0],1))), axis=1)
plotimg = ax.pcolormesh(X_warped, Y_warped, img.mean(-1)[:,:], cmap='Greys')
plotimg.set_array(None)
plotimg.set_edgecolor('none')
plotimg.set_facecolor(rgba)
ax.set_aspect('equal')
plt.show()
I have the feeling that numpy.meshgrid is quite memory intensive, but I'm not sure. Does anybody see where my memory gets eaten away rapidly? (BTW, I have a laptop with 12Gb of RAM, which is only used by other programs for a very small part)

You might want to profile your code with this library.
It will show you where your script is using memory.

There is a Stackoverflow question here about memory profilers. Also, I've used the trick in this answer in the past as a quick way to get an idea where in the code memory is going out of control. I just print the resource.getrusage() results all over the place. It's not clean, and it doesn't always work, but it's part of the standard library and it's easy to do.

I ordinarily profile with the profile and cProfile modules, as it makes testing individual sections of code fairly easy.
Python Profilers

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

HOG training and detection in Python using OpenCV - python

Related

Best method to add noise on tf.dataset images "on the fly"

Digits recognition with rules

How to reshape numpy image array for color channel

Better image normalization with numpy

python memory intensive script

Categories

Resources