HOG Feature extraction

HOG Feature extraction - python

I want to extract HOG features of Line images of Arabic Handwriting. The code is as follows. So , I want help regarding how to input the image and how to output the features . Can anyone please help me regarding this.
import numpy as np
from scipy import sqrt, pi, arctan2, cos, sin
from scipy.ndimage import uniform_filter
def hog(image, orientations=9, pixels_per_cell=(8, 8),
cells_per_block=(3, 3), visualise=False, normalise=False):
"""Extract Histogram of Oriented Gradients (HOG) for a given image.
Compute a Histogram of Oriented Gradients (HOG) by
1. (optional) global image normalisation
2. computing the gradient image in x and y
3. computing gradient histograms
4. normalising across blocks
5. flattening into a feature vector
Parameters
----------
image : (M, N) ndarray
Input image (greyscale).
orientations : int
Number of orientation bins.
pixels_per_cell : 2 tuple (int, int)
Size (in pixels) of a cell.
cells_per_block : 2 tuple (int,int)
Number of cells in each block.
visualise : bool, optional
Also return an image of the HOG.
normalise : bool, optional
Apply power law compression to normalise the image before
processing.
Returns
-------
newarr : ndarray
HOG for the image as a 1D (flattened) array.
hog_image : ndarray (if visualise=True)
A visualisation of the HOG image.
References
----------
* http://en.wikipedia.org/wiki/Histogram_of_oriented_gradients
* Dalal, N and Triggs, B, Histograms of Oriented Gradients for
Human Detection, IEEE Computer Society Conference on Computer
Vision and Pattern Recognition 2005 San Diego, CA, USA
"""
image = np.atleast_2d(image)
"""
The first stage applies an optional global image normalisation
equalisation that is designed to reduce the influence of illumination
effects. In practice we use gamma (power law) compression, either
computing the square root or the log of each colour channel.
Image texture strength is typically proportional to the local surface
illumination so this compression helps to reduce the effects of local
shadowing and illumination variations.
"""
if image.ndim > 3:
raise ValueError("Currently only supports grey-level images")
if normalise:
image = sqrt(image)
"""
The second stage computes first order image gradients. These capture
contour, silhouette and some texture information, while providing
further resistance to illumination variations. The locally dominant
colour channel is used, which provides colour invariance to a large
extent. Variant methods may also include second order image derivatives,
which act as primitive bar detectors - a useful feature for capturing,
e.g. bar like structures in bicycles and limbs in humans.
"""
gx = np.zeros(image.shape)
gy = np.zeros(image.shape)
gx[:, :-1] = np.diff(image, n=1, axis=1)
gy[:-1, :] = np.diff(image, n=1, axis=0)
"""
The third stage aims to produce an encoding that is sensitive to
local image content while remaining resistant to small changes in
pose or appearance. The adopted method pools gradient orientation
information locally in the same way as the SIFT [Lowe 2004]
feature. The image window is divided into small spatial regions,
called "cells". For each cell we accumulate a local 1-D histogram
of gradient or edge orientations over all the pixels in the
cell. This combined cell-level 1-D histogram forms the basic
"orientation histogram" representation. Each orientation histogram
divides the gradient angle range into a fixed number of
predetermined bins. The gradient magnitudes of the pixels in the
cell are used to vote into the orientation histogram.
"""
magnitude = sqrt(gx ** 2 + gy ** 2)
orientation = arctan2(gy, (gx + 1e-15)) * (180 / pi) + 90
sy, sx = image.shape
cx, cy = pixels_per_cell
bx, by = cells_per_block
n_cellsx = int(np.floor(sx // cx)) # number of cells in x
n_cellsy = int(np.floor(sy // cy)) # number of cells in y
# compute orientations integral images
orientation_histogram = np.zeros((n_cellsy, n_cellsx, orientations))
for i in range(orientations):
#create new integral image for this orientation
# isolate orientations in this range
temp_ori = np.where(orientation < 180 / orientations * (i + 1),
orientation, 0)
temp_ori = np.where(orientation >= 180 / orientations * i,
temp_ori, 0)
# select magnitudes for those orientations
cond2 = temp_ori > 0
temp_mag = np.where(cond2, magnitude, 0)
orientation_histogram[:,:,i] = uniform_filter(temp_mag, size=(cy, cx))[cy/2::cy, cx/2::cx]
# now for each cell, compute the histogram
#orientation_histogram = np.zeros((n_cellsx, n_cellsy, orientations))
radius = min(cx, cy) // 2 - 1
hog_image = None
if visualise:
hog_image = np.zeros((sy, sx), dtype=float)
if visualise:
from skimage import draw
for x in range(n_cellsx):
for y in range(n_cellsy):
for o in range(orientations):
centre = tuple([y * cy + cy // 2, x * cx + cx // 2])
dx = radius * cos(float(o) / orientations * np.pi)
dy = radius * sin(float(o) / orientations * np.pi)
rr, cc = draw.bresenham(centre[0] - dx, centre[1] - dy,
centre[0] + dx, centre[1] + dy)
hog_image[rr, cc] += orientation_histogram[y, x, o]
"""
The fourth stage computes normalisation, which takes local groups of
cells and contrast normalises their overall responses before passing
to next stage. Normalisation introduces better invariance to illumination,
shadowing, and edge contrast. It is performed by accumulating a measure
of local histogram "energy" over local groups of cells that we call
"blocks". The result is used to normalise each cell in the block.
Typically each individual cell is shared between several blocks, but
its normalisations are block dependent and thus different. The cell
thus appears several times in the final output vector with different
normalisations. This may seem redundant but it improves the performance.
We refer to the normalised block descriptors as Histogram of Oriented
Gradient (HOG) descriptors.
"""
n_blocksx = (n_cellsx - bx) + 1
n_blocksy = (n_cellsy - by) + 1
normalised_blocks = np.zeros((n_blocksy, n_blocksx,
by, bx, orientations))
for x in range(n_blocksx):
for y in range(n_blocksy):
block = orientation_histogram[y:y + by, x:x + bx, :]
eps = 1e-5
normalised_blocks[y, x, :] = block / sqrt(block.sum() ** 2 + eps)
"""
The final step collects the HOG descriptors from all blocks of a dense
overlapping grid of blocks covering the detection window into a combined
feature vector for use in the window classifier.
"""
if visualise:
return normalised_blocks.ravel(), hog_image
else:
return normalised_blocks.ravel()

You can use the OpenCV library to read image files into NumPy arrays.

Related

Create an Undistorted Top-Down View of Camera Image

I have a fixed camera mounted on a wall viewing a rectangular lawn at an angle. My goal is to obtain an undistorted, top-down view of the lawn.
I have an image from the camera as a python numpy array which looks like this:
raw camera image
I use an inverse matrix with skimage.transform.warp to correct the image to a top down view:
top down distorted
This works perfectly, however the camera lens introduces barrel distortion.
Seperately, I can correct the distortion with a generated lookup table using skimage.transform.warp_coords and passing a simple undistort callable function based on the algorithm described here.
The image is then generated using scipy.ndimage.map_coordinates:
undistorted camera view
These 2 processes work individually, but how do I combine them to create an undistorted top-down view, without creating an intermediate image?
I could run each point in the lookup table through the matrix to create a new table, but the table is massive and memory is tight (Raspberry Pi Zero).
I would like to define the undistortion as a matrix and just combine the 2 matrices, but as I understand it, the projective homography matrix is linear but undistortion is non-linear, so this can't be done. I can't use OpenCV due to resource constraints, and the calibration procedure involving multiple chessboard images is impractical. Currently, I calibrate by taking 4 lawn corner points and generate the matrix from them, which works well.
I would have anticipated that this is a common problem in Computer Vision but can't find any suitable solutions.

The barrel distortion is nonlinear, but it is also smooth. This means it can be well approximated by a collection of piecewise linear approximations.
So you do not need a large, per-pixel look-up table of un-distortion displacements. Rather, you can subsample it (or just scale it down), and use bilinear interpolation for in-between pixels.

I have found a solution that appears to work by creating seperate functions for undistort and transformation, then chaining them together.
The skimage source code here has the _apply_mat method for generating a mapping from a matrix. I based my unwarp function on that:
def unwarp(coords, matrix):
coords = np.array(coords, copy=False, ndmin=2)
x, y = np.transpose(coords)
src = np.vstack((x, y, np.ones_like(x)))
dst = src.T # matrix.T
# below, we will divide by the last dimension of the homogeneous
# coordinate matrix. In order to avoid division by zero,
# we replace exact zeros in this column with a very small number.
dst[dst[:, 2] == 0, 2] = np.finfo(float).eps
# rescale to homogeneous coordinates
dst[:, :2] /= dst[:, 2:3]
return dst[:, :2]
I created a similar function for undistorting based on Tanner Hellands algorithm:
def undistort(coords, cols, rows, correction_radius, zoom):
half_width = cols / 2
half_height = rows / 2
new_x = coords[:, 0] - half_width
new_y = coords[:, 1] - half_height
distance = np.hypot(new_x, new_y)
r = distance / correction_radius
theta = np.ones_like(r)
# only process non-zero values
np.divide(np.arctan(r), r, out=theta, where=r!=0)
source_x = half_width + theta * new_x * zoom
source_y = half_height + theta * new_y * zoom
result = np.column_stack([source_x, source_y])
return result
The only tricky bit here is the divide where we need to prevent division by zero.
Once we have each lookup table we can chain them together:
def undistort_unwarp(coords):
undistorted = undistort(coords)
both = unwarp(undistorted)
return both
Note that these are the callable functions passed to skimage.transform.warp_coords:
mymap = tf.warp_coords(undistort_unwarp, shape=(rows, cols), dtype=np.int16)
The map can then be passed to the skimage.transform.warp function.
Francesco's answer was helpful, however I needed the full pixel resolution for the transformation, so I used it for the undistort as well, and looked to other ways to reduce the memory consumption.
Each map consumes
rows * cols * bytes-per-item * 2 (x and y)
bytes. The default datatype is float64, which requires 8 bytes-per-item, and the documentation suggests sane choices would be the default or float32 at 4 bytes-per-item. I was able to reduce this to 2 bytes-per-item using int16 with no visible ill effects, but I suspect the spline interpolation is not being used to the full (at all?).
The map is the same for each channel of a colour RGB image. However, when I called warp_coords with shape=(rows, cols, 3) I got 3 duplicate maps, so I created a function to handle colour images by processing each channel separately:
def warp_colour(img_arr, coord_map):
if img_arr.ndim == 3:
# colour
rows, cols, _chans = img_arr.shape
r_arr = tf.warp(img_arr[:, :, 0], inverse_map=coord_map, output_shape=(rows, cols))
g_arr = tf.warp(img_arr[:, :, 1], inverse_map=coord_map, output_shape=(rows, cols))
b_arr = tf.warp(img_arr[:, :, 2], inverse_map=coord_map, output_shape=(rows, cols))
rgb_arr = np.dstack([r_arr, g_arr, b_arr])
else:
# grayscale
rows, cols = img_arr.shape
rgb_arr = tf.warp(img_arr, inverse_map=coord_map, output_shape=(rows, cols))
return rgb_arr
One issue with skimage.transform.warp_coords is that it does not have the map_args dictionary parameter that skimage.transform.warp has. I had to call my unwarp and undistort functions through an intermediate function to add the parameters.

Python: obtain radiometric values from OpenEXR channels

Good morning everybody! I created a physically accurate scene in Blender and my aim, using python, is to study radiometric conditions over the rendered scene in order to obtain an illumination map in terms of [W/m^2]. I saved images in OpenEXR file format, due to its high dynamic-range properties and I wanted to obtain a Relative Luminance Map starting from RGB values in "R" "G" "B" channels. The major issue is how to scale linear values from OpenEXR channels to have physically accurate values between [0, 1], needed to obtain Relative Luminance map maintaining the HDR properties of the file format. Part of the code is reported below.
pt = Imath.PixelType(Imath.PixelType.FLOAT)
exrfile = exr.InputFile(filename)
dw = exrfile.header()['dataWindow']
size = (dw.max.x - dw.min.x + 1, dw.max.y - dw.min.y + 1)
redstr = exrfile.channel('R', pt)
red = np.fromstring(redstr, dtype = np.float32)
red.shape = (size[1], size[0]) # Numpy arrays are (row, col)
greenstr = exrfile.channel('G', pt)
green = np.fromstring(greenstr, dtype = np.float32)
green.shape = (size[1], size[0]) # Numpy arrays are (row, col)
bluestr = exrfile.channel('B', pt)
blue = np.fromstring(bluestr, dtype = np.float32)
blue.shape = (size[1], size[0]) # Numpy arrays are (row, col)
rel_luminance = 0.2126*red[:,:]+0.7152*green[:,:]+0.0722*blue[:,:]
For a test image the obtained Max values of the three channels are respectively:
Max(R) = 198.16421508789062
Max(G) = 173.5792999267578
Max(B) = 163.20120239257812
The obtained values are obviously not in the range between [0, 1], moreover I am not able to understand the global maximum value to scale the channels and obtain what i want.
Has someone some tips to solve my problem? Thanks in advice.

A few points…
RGB is tristimulus information. It will never be “radiometric” but rather radiometric-like representations within the limitations of tristimulus.
There is no limitation on the upper or lower limits in an EXR tristimulus encoding. The meaning comes from the ratios between the values, or an additional piece of information in the rare case the units are intended to be absolute.
A good rule of thumb is that any time the term lum is in a word, it is photometric (IE human-centric domain) where rad is likely radiometric. Illuminance, luminance etc. are photometric massaged values,while irradiance and radiance are the physical model side.
Calculating achromatic luminance from an RGB triplet is a weighted sum of components. For BT.709 based sRGB tristimulus systems, that weighting is 0.2126 * R + 0.7152 * G + 0.0722 * B. Again, note this is an approximation based off of the CIE 1920 luminous efficacy function. Also note, luminance does not adequately represent the cumulative equivalent achromatic luminance contribution.

Python fractal box count - fractal dimension

I have some images for which I want to calculate the Minkowski/box count dimension to determine the fractal characteristics in the image. Here are 2 example images:
10.jpg:
24.jpg:
I'm using the following code to calculate the fractal dimension:
import numpy as np
import scipy
def rgb2gray(rgb):
r, g, b = rgb[:,:,0], rgb[:,:,1], rgb[:,:,2]
gray = 0.2989 * r + 0.5870 * g + 0.1140 * b
return gray
def fractal_dimension(Z, threshold=0.9):
# Only for 2d image
assert(len(Z.shape) == 2)
# From https://github.com/rougier/numpy-100 (#87)
def boxcount(Z, k):
S = np.add.reduceat(
np.add.reduceat(Z, np.arange(0, Z.shape[0], k), axis=0),
np.arange(0, Z.shape[1], k), axis=1)
# We count non-empty (0) and non-full boxes (k*k)
return len(np.where((S > 0) & (S < k*k))[0])
# Transform Z into a binary array
Z = (Z < threshold)
# Minimal dimension of image
p = min(Z.shape)
# Greatest power of 2 less than or equal to p
n = 2**np.floor(np.log(p)/np.log(2))
# Extract the exponent
n = int(np.log(n)/np.log(2))
# Build successive box sizes (from 2**n down to 2**1)
sizes = 2**np.arange(n, 1, -1)
# Actual box counting with decreasing size
counts = []
for size in sizes:
counts.append(boxcount(Z, size))
# Fit the successive log(sizes) with log (counts)
coeffs = np.polyfit(np.log(sizes), np.log(counts), 1)
return -coeffs[0]
I = rgb2gray(scipy.misc.imread("24.jpg"))
print("Minkowski–Bouligand dimension (computed): ", fractal_dimension(I))
From the literature I've read, it has been suggested that natural scenes (e.g. 24.jpg) are more fractal in nature, and thus should have a larger fractal dimension value
The results it gives me are in the opposite direction than what the literature would suggest:
10.jpg: 1.259
24.jpg: 1.073
I would expect the fractal dimension for the natural image to be larger than for the urban
Am I calculating the value incorrectly in my code? Or am I just interpreting the results incorrectly?

With fractal dimension of something physical the dimension might converge at different stages to different values. For example, a very thin line (but of finite width) would initially seem one dimensional, then eventual two dimensional as its width becomes of comparable size to the boxes used.
Lets see the dimensions that you have produced:
What do you see? Well the linear fits are not so good. And the dimensions is going towards a value of two.
To diagnose, lets take a look at the grey-scale images produced, with the threshold that you have (that is, 0.9):
The nature picture has almost become an ink blob. The dimensions would go to a value of 2 very soon, as the graphs told us. That is because we pretty much lost the image.
And now with a threshold of 50?
With new linear fits that are much better, the dimensions are 1.6 and 1.8 for urban and nature respectively. Keep in mind, that the urban picture actually has a lot of structure to it, in particular on the textured walls.
In future good threshold values would be ones closer to the mean of the grey scale images, that way your image does not turn into a blob of ink!
A good text book on this is "Fractals everywhere" by Michael F. Barnsley.

Optimizing star profile (gaussian) fitting algorithm

I need to fit several thousand 2D gaussian functions to star profiles (14x14 pixel blocks) on CCD image and get centroid coordinates, FWHM over long and short axes and angle of rotation of the long axis. The problem is that my current code takes too long to execute. Several 10s of seconds on i7 processor and I need it to make much faster. Preferably as fast as possible. I tested several gaussian fitting functions and it appears that the one used in AsPyLib is the fastest http://www.aspylib.com/doc/aspylib_fitting.html
Below is the code that I'm trying to make run faster. Profiling showed that most of the time is spent inside mplfit function.So my question is if this can be accelerated? I tried cythonizing the code but it provided really minor boost. Calculating moments (10x faster) is not suitable for many images that I have due to noise issues which make estimates unreliable. And probably due to the fact that star profiles are often far from being gaussian due to aberrations.
Multiprocessing wasn't a solution either as new process creation overhead was too high for fitting just 1 star profile.
Any ideas where to look further?
import numpy as np
from scipy.optimize import leastsq
def fit_gauss_elliptical(xy, data):
"""
---------------------
Purpose
Fitting a star with a 2D elliptical gaussian PSF.
---------------------
Inputs
* xy (list) = list with the form [x,y] where x and y are the integer positions in the complete image of the first pixel (the one with x=0 and y=0) of the small subimage that is used for fitting.
* data (2D Numpy array) = small subimage, obtained from the full FITS image by slicing. It must contain a single object : the star to be fitted, placed approximately at the center.
---------------------
Output (list) = list with 8 elements, in the form [maxi, floor, height, mean_x, mean_y, fwhm_small, fwhm_large, angle]. The list elements are respectively:
- maxi is the value of the star maximum signal,
- floor is the level of the sky background (fit result),
- height is the PSF amplitude (fit result),
- mean_x and mean_y are the star centroid x and y positions, on the full image (fit results),
- fwhm_small is the smallest full width half maximum of the elliptical gaussian PSF (fit result) in pixels
- fwhm_large is the largest full width half maximum of the elliptical gaussian PSF (fit result) in pixels
- angle is the angular direction of the largest fwhm, measured clockwise starting from the vertical direction (fit result) and expressed in degrees. The direction of the smallest fwhm is obtained by adding 90 deg to angle.
---------------------
"""
#find starting values
dat=data.flatten()
maxi = data.max()
floor = np.ma.median(dat)
height = maxi - floor
if height==0.0: #if star is saturated it could be that median value is 32767 or 65535 --> height=0
floor = np.mean(dat)
height = maxi - floor
mean_x = (np.shape(data)[0]-1)/2
mean_y = (np.shape(data)[1]-1)/2
fwhm = np.sqrt(np.sum((data>floor+height/2.).flatten()))
fwhm_1 = fwhm
fwhm_2 = fwhm
sig_1 = fwhm_1 / (2.*np.sqrt(2.*np.log(2.)))
sig_2 = fwhm_2 / (2.*np.sqrt(2.*np.log(2.)))
angle = 0.
p0 = floor, height, mean_x, mean_y, sig_1, sig_2, angle
#---------------------------------------------------------------------------------
#fitting gaussian
def gauss(floor, height, mean_x, mean_y, sig_1, sig_2, angle):
A = (np.cos(angle)/sig_1)**2. + (np.sin(angle)/sig_2)**2.
B = (np.sin(angle)/sig_1)**2. + (np.cos(angle)/sig_2)**2.
C = 2.0*np.sin(angle)*np.cos(angle)*(1./(sig_1**2.)-1./(sig_2**2.))
#do not forget factor 0.5 in exp(-0.5*r**2./sig**2.)
return lambda x,y: floor + height*np.exp(-0.5*(A*((x-mean_x)**2)+B*((y-mean_y)**2)+C*(x-mean_x)*(y-mean_y)))
def err(p,data):
return np.ravel(gauss(*p)(*np.indices(data.shape))-data)
p = leastsq(err, p0, args=(data), maxfev=200)
p = p[0]
#---------------------------------------------------------------------------------
#formatting results
floor = p[0]
height = p[1]
mean_x = p[2] + xy[0]
mean_y = p[3] + xy[1]
#angle gives the direction of the p[4]=sig_1 axis, starting from x (vertical) axis, clockwise in direction of y (horizontal) axis
if np.abs(p[4])>np.abs(p[5]):
fwhm_large = np.abs(p[4]) * (2.*np.sqrt(2.*np.log(2.)))
fwhm_small = np.abs(p[5]) * (2.*np.sqrt(2.*np.log(2.)))
angle = np.arctan(np.tan(p[6]))
else: #then sig_1 is the smallest : we want angle to point to sig_y, the largest
fwhm_large = np.abs(p[5]) * (2.*np.sqrt(2.*np.log(2.)))
fwhm_small = np.abs(p[4]) * (2.*np.sqrt(2.*np.log(2.)))
angle = np.arctan(np.tan(p[6]+np.pi/2.))
output = [maxi, floor, height, mean_x, mean_y, fwhm_small, fwhm_large, angle]
return output

I have come across this method before but not implemented it:
http://www.nature.com/nmeth/journal/v9/n7/full/nmeth.2071.html
Might be worth a look as a short-cut method. There is even code to copy on his website http://physics-server.uoregon.edu/~raghu/particle_tracking.html

Calculating element position by computing transformation

This question is related to Transformation between two set of points . Hovewer this is better specified, and some assumptions added.
I have element image and some model.
I've detected contours on both
contoursModel0, hierarchyModel = cv2.findContours(model.copy(), cv2.RETR_LIST,
cv2.CHAIN_APPROX_SIMPLE);
contoursModel = [cv2.approxPolyDP(cnt, 2, True) for cnt in contoursModel0];
contours0, hierarchy = cv2.findContours(canny.copy(), cv2.RETR_LIST,
cv2.CHAIN_APPROX_SIMPLE);
contours = [cv2.approxPolyDP(cnt, 2, True) for cnt in contours0];
Then I've matched each contour to each other
modelMassCenters = [];
imageMassCenters = [];
for cnt in contours:
for cntModel in contoursModel:
result = cv2.matchShapes(cnt, cntModel, cv2.cv.CV_CONTOURS_MATCH_I1, 0);
if(result != 0):
if(result < 0.05):
#Here are matched contours
momentsModel = cv2.moments(cntModel);
momentsImage = cv2.moments(cnt);
massCenterModel = (momentsModel['m10']/momentsModel['m00'],
momentsModel['m01']/momentsModel['m00']);
massCenterImage = (momentsImage['m10']/momentsImage['m00'],
momentsImage['m01']/momentsImage['m00']);
modelMassCenters.append(massCenterModel);
imageMassCenters.append(massCenterImage);
Matched contours are something like features.
Now I want to detect transformation between this two sets of points.
Assumptions: element is rigid body, only rotation, displacement and scale change.
Some features may be miss detected how to eliminate them. I've once used cv2.findHomography and it takes two vectors and calculates homography between them even there are some miss matches.
cv2.getAffineTransformation takes only three points (can't cope missmatches) and here I have multiple features.
Answer in my previous question says how to calculate this transformation but does not take missmatches. Also I think that it is possible to return some quality level from algorithm (by checking how many points are missmatched, after computing some transformation from the rest)
And the last question: should I take all vector points to compute transformation or treat only mass centers of this shapes as feature?
To show it I've added simple image. Features with green are good matches in red bad matches. Here match should be computed from 3 green featrues and red missmatches should affect match quality.
I'm adding fragments of solution I've figured out for now (but I think it could be done much better):
for i in range(0, len(modelMassCenters) - 1):
for j in range(i + 1, len(modelMassCenters) - 1 ):
x1, y1 = modelMassCenters[i];
x2, y2 = modelMassCenters [j];
modelVec = (x2 - x1, y2 - y1);
x1, y1 = imageMassCenters[i];
x2, y2 = imageMassCenters[j];
imageVec = (x2 - x1, y2 - y1);
rotation = angle(modelVec,imageVec);
rotations.append((i, j, rotation));
scale = length(modelVec)/length(imageVec);
scales.append((i, j, scale));
After computing scales and rotation given by each pair of corresponding lines I'm going to find median value and average values of rotation which does not differ more than some delta from median value. The same thing with scale. Then points which are making those values taken to computation will be used to compute displacement.

Your second step (match contours to each other by doing a pairwise shape comparison) sounds very vulnerable to errors if features have a similar shape, e.g., you have several similar-sized circular contours. Yet if you have a rigid body with 5 circular features in one quadrant only, you could get a very robust estimate of the affine transform if you consider the body and its features as a whole. So don't discard information like a feature's range and direction from the center of the whole body when matching features. Those are at least as important in correlating features as size and shape of the individual contour.
I'd try something like (untested pseudocode):
"""
Convert from rectangular (x,y) to polar (r,w)
r = sqrt(x^2 + y^2)
w = arctan(y/x) = [-\pi,\pi]
"""
def polar(x, y): # w in radians
from math import hypot, atan2, pi
return hypot(x, y), atan2(y, x)
model_features = []
model = params(model_body_contour) # return tuple (center_x, center_y, area)
for contour in model_feature_contours:
f = params(countour)
range, angle = polar(f[0]-model[0], f[1]-model[1])
model_features.append((angle, range, f[2]))
image_features = []
image = params(image_body_contour)
for contour in image_feature_contours:
f = params(countour)
range, angle = polar(f[0]-image[0], f[1]-image[1])
image_features.append((angle, range, f[2]))
# sort image_features and model_features by angle, range
#
# correlate image_features against model_features across angle offsets
# rotation = angle offset of max correlation
# scale = average(model areas and ranges) / average(image areas and ranges)
If you have very challenging images, such as a ring of 6 equally-spaced similar-sized features, 5 of which have the same shape and one is different (e.g. 5 circles and a star), you could add extra parameters such as eccentricity and sharpness to the list of feature parameters, and include them in the correlation when searching for the rotation angle.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.