Image-processing convolution kernels are calculated dynamically - python

Using standard numpy and cv2.filter2D solutions I can apply static convolutions to an image:
import numpy as np
convolution_kernel = np.array([[-2, -1, 0],
[-1, 1, 1],
[0, 1, 2]])
import cv2
image = cv2.imread('1.png') result = cv2.filter2D(image, -1, convolution_kernel)
(example from https://stackoverflow.com/a/58383803/3310334)
Every pixel at [i, j] in the output image has a value calculated by centering a 3x3 "window" onto [i, j] in the input image, and then multiplying each value in the window by the corresponding value in the convolution kernel (Hadamard product) and finally summing the 9 products to get the value for [i, j] in the output image (for each color channel).
(image from: https://github.com/ashushekar/image-convolution-from-scratch#convolution)
In my case, the function to perform to calculate for each output pixel is not as simple as sum of Hadamard product. It is for each pixel calculated from operations performed on known-size windows into two input matrices centered around that pixel.
I have two input matrixes ("images"), like
A = [[179, 97, 77, 118, 144, 105],
[ 68, 56, 184, 210, 141, 230],
[178, 166, 218, 47, 106, 172],
[ 38, 183, 50, 185, 48, 87],
[ 60, 200, 228, 232, 6, 190],
[253, 75, 231, 166, 117, 134]]
B = [[116, 95, 94, 220, 80, 223],
[135, 9, 166, 78, 5, 129],
[102, 167, 120, 81, 141, 29],
[ 83, 117, 81, 129, 255, 48],
[130, 231, 165, 7, 187, 169],
[ 44, 137, 16, 50, 229, 202]]
And in the output matrix, each [i, j] pixel should be calculated as the sum of all of A[u,v] ** 2 - B[u,v] ** 2 values for [u, v] coordinates within 3x3 "windows" onto the two (same-sized) input matrixes.
How can I calculate this output matrix quickly in Python?
Using numpy, it seems to be the 3x3 sums of A * A - B * B, but how to do those sums? Or is there another "2d map" process I could be using?
I've written a loop-based solution to calculate the expected output for these two examples:
W = 3 # size of kernel is WxW
out = np.zeros(A.shape)
difference_of_squares = A * A - B * B
for i, j in np.ndindex(out.shape):
starti = max(i - W//2, 0) # use smaller kernels at input's boundaries, output will have same dimension as input
stopi = min(i - W//2 + W, np.shape(out)[0]) # I'm not worried at this point about what happens at boundaries
startj = max(j - W//2, 0) # standard convolution solutions are often just reducing output size or padding input with zeroes
stopj = min(j - W//2 + W, np.shape(out)[1])
out[i, j] = np.sum(difference_of_squares[starti:stopi, startj:stopj])
print(out)
[[ 8423. 11816. 10372. 41125. 35287. 31747.]
[ 29370. 65887. 38811. 61252. 51033. 51845.]
[ 24756. 60119. 109133. 35101. 70005. 18757.]
[ 8641. 62463. 126935. 14530. 2255. -64752.]
[ 36623. 110426. 163513. 33812. -50035. -146450.]
[ 22268. 100132. 130190. 83010. -10163. -88994.]]

You can use scipy.signal.convolve2d:
from scipy.signal import convolve2d
# Same shape as original (6x6)
>>> convolve2d(A**2-B**2, np.ones((3, 3), dtype=int), mode='same')
array([[ 8423, 11816, 10372, 41125, 35287, 31747],
[ 29370, 65887, 38811, 61252, 51033, 51845],
[ 24756, 60119, 109133, 35101, 70005, 18757],
[ 8641, 62463, 126935, 14530, 2255, -64752],
[ 36623, 110426, 163513, 33812, -50035, -146450],
[ 22268, 100132, 130190, 83010, -10163, -88994]])
# Shape reduce by 1 (5x5)
>>> convolve2d(A**2-B**2, np.ones((3, 3), dtype=int), mode='valid')
array([[ 65887, 38811, 61252, 51033],
[ 60119, 109133, 35101, 70005],
[ 62463, 126935, 14530, 2255],
[110426, 163513, 33812, -50035]])
Note: You have to play around with the "mode" and "limit" parameters until you get what you want.
Update
If the border is not a problem at this point, you can use sliding_window_view:
from numpy.lib.stride_tricks import sliding_window_view
>>> np.sum(sliding_window_view(A**2-B**2, (3, 3)), axis=(2, 3))
array([[ 65887, 38811, 61252, 51033],
[ 60119, 109133, 35101, 70005],
[ 62463, 126935, 14530, 2255],
[110426, 163513, 33812, -50035]])

Related

I'm not able to get the correct homography points

I'm trying to create homography matrix from this field:
(source image with ids:)
To the destination image:
The points for the source points are:
pts_src = [[ 761, 704],
[ 910, 292],
[1109, 544],
[ 619, 479],
[ 656, 373 ],
[1329, 446],
[ 20, 559],
[ 87, 664],
[ 238, 501],
[ 399, 450]]
And the points for destination points (new image):
pts_dst = [[147, 330],
[ 35 , 20],
[147, 225],
[ 75, 203],
[ 35, 155],
[147, 155],
[ 35, 317],
[ 75, 351],
[ 35, 237],
[ 35, 203]]
I tried to create homography matrix with the following code:
import numpy as np
pts_src = np.array(pts_src)
pts_dst = np.array(pts_dst)
h, status = cv2.findHomography(pts_src, pts_dst)
print(h) #homography matrix
And I got the following homography matrix:
[[ 4.00647822e-01 1.41196305e+00 -6.90548584e+02]
[-1.28068526e-01 3.03783700e+00 -6.98945354e+02]
[ 3.12182175e-04 4.06980322e-03 1.00000000e+00]]
I tried to check if the homography matrix is correct, so I used the first coordinate from the source image (761, 704), and check if I get the correct coordinate in destination image, which should be (147, 330). I tried to use the equation new_x, new_y, new_z = h*(x,y,z):
p = [761, 704, 1]
print(np.matmul(h, p))
And I got:
array([ 608.36639573, 1342.23174648, 4.1027121 ])
Which is very far (and for some reason the z is 4.1).
And I also tried the second point (910, 292), and I should get (35 , 20), but I got [86.33414416, 71.5606917 , 2.47246832].
Any idea why I'm not able to get the correct coordinates?
The solution was mentioned in the comments thanks to #fana - dividing the x and y coordinates by z (x/z, y/z).

Make an array like numpy.array() without numpy

I've an image processing task and we're prohibited to use NumPy so we need to code from scratch. I've done the logic image transformation but now I'm stuck on creating an array without numpy.
So here's my last output code :
Output :
new_log =
[[236,
232,
226,
.
.
.
198,
204]]
I need to convert this to an array so I can write the image like this (with Numpy)
new_log =
array([[236, 232, 226, ..., 208, 209, 212],
[202, 197, 187, ..., 198, 200, 203],
[192, 188, 180, ..., 205, 206, 207],
...,
[233, 226, 227, ..., 172, 189, 199],
[235, 233, 228, ..., 175, 182, 192],
[235, 232, 228, ..., 195, 198, 204]], dtype=uint8)
cv.imwrite('log_transformed.jpg', new_log)
# new_log must be shaped like the second output
You can make a straightforward function to take your list and reshape it in a similar way to NumPy's np.reshape(). But it's not going to be fast, and it doesn't know anything about data types (NumPy's dtype) so... my advice is to challenge whoever it is that doesn't like NumPy. Especially if you're using OpenCV — it depends on NumPy!
Here's an example of what you could do in pure Python:
def reshape(l, shape):
"""Reshape a list.
Example
-------
>>> l = [1,2,3,4,5,6,7,8,9]
>>> reshape(l, shape=(3, -1))
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
"""
nrows, ncols = shape
if ncols == -1:
ncols = len(l) // nrows
if nrows == -1:
nrows = len(l) // ncols
array = []
for r in range(nrows):
row = []
for c in range(ncols):
row.append(l[ncols*r + c])
array.append(row)
return array

Python find convolution kernel if input image and output image is known

I have a problem with convolution kernel in python. It is about simple convolution operator. I have input matrix and output matrix. I want to find a possible convolution kernel with size(5x5). How to solve this problem with python, numpy or tensorflow ?
import scipy.signal as ss
input_img = np.array([[94, 166, 76, 106, 152, 232],
[48, 242, 30, 98, 46, 210],
[52, 60, 86, 60, 216, 248],
[52, 236, 116, 240, 224, 184],
[138, 160, 146, 254, 236, 252],
[94, 100, 224, 246, 152, 74]], dtype=float)
output_img = np.array([[15, 49, 23, 105, 0, 0],
[43,30, 108, 124, 0, 0],
[58, 120, 112, 92, 0, 0],
[73, 127, 118, 126, 0, 0],
[112, 123, 76, 37, 0, 0],
[0, 0, 0, 0, 0, 0]], dtype=float)
# I want to find this kernel
conv = np.zeros((5,5), dtype=int)
# So if I do convolution operator, output_img will resulting a value same as I defined above
output_img = ss.convolve2d(input_img, conv, padding='same')
As far as I understood, you need to reconstruct window weights by given input, output arrays and window size. This is possible, I think, especially, if input array (image) is sufficiently big.
Look at the code below:
import scipy.signal as ss
import numpy as np
source_dataset = np.random.rand(20, 10)
sample_convolution = np.diag([1, 1, 1])
output_dataset = ss.convolve2d(data, sample_convolution, mode='same')
conv_size = c.shape[0]
# Given output_dataset, source_datset, and conv_size we need to reconstruct
# window weights.
def reconstruct(data, output, csize):
half_size = int(csize / 2)
min_row_ind = half_size
max_row_ind = int(data.shape[0]) - half_size
min_col_ind = half_size
max_col_ind = int(data.shape[1]) - half_size
A = list()
b = list()
for i in np.arange(min_row_ind, max_row_ind, dtype=int):
for j in np.arange(min_col_ind, max_col_ind, dtype=int):
A.append(data[(i - half_size):(i + half_size + 1), (j - half_size):(j + half_size + 1)].ravel().tolist())
b.append(output[i, j])
if len(A) == csize * csize and np.linalg.matrix_rank(A) == csize * csize:
return (np.linalg.pinv(A)#np.array(b)[:, np.newaxis]).reshape(csize, csize)
if len(A) < csize*csize:
raise Exception("Insufficient data")
result = reconstruct(source_dataset, output_dataset, 3)
I got the following result
array([[ 1.00000000e+00, -1.77635684e-15, -1.11022302e-16],
[ 0.00000000e+00, 1.00000000e+00, -8.88178420e-16],
[ 0.00000000e+00, -1.22124533e-15, 1.00000000e+00]])
So, it works as expected; but definitely need to be improved to take into account edge effects, case when size of window is even etc.

OpenCV local pixel average generating extrange output

I am trying to use python to just compute a local pixel color average, however my output is not at all that.
Image:
Output:
Code:
image = cv2.imread('perspective.jpeg')
for i in range(image.shape[1]):
for j in range(image.shape[0]):
up = image[min(j + 1, image.shape[0]-1), i]
down = image[max(j - 1, 0), i]
right = image[j, min(i + 1, image.shape[1]-1)]
left = image[j, max(i - 1, 0)]
average = (up + down + left + right + image[j, i]) / 5
image[j, i] = average
The issues that you are observing is due to integer arithmetic overflow while computing the average. The reason of overflow is that the pixels are of type np.uint8 which when added together, generate result of type np.uint8 which is not large enough to hold the result of addition.
The solution to this problem is to cast the pixels to a larger data-type before adding them. Then cast the final value back to np.uint8 before storing back to the result image.
In-fact, casting only one of the values (say up) to larger data type will suffice as the rest of them will automatically be upgraded while performing addition.
The corrected code may look like this:
image = cv2.imread('perspective.jpeg')
for i in range(image.shape[1]):
for j in range(image.shape[0]):
up = np.float32(image[min(j + 1, image.shape[0]-1), i])
down = image[max(j - 1, 0), i]
right = image[j, min(i + 1, image.shape[1]-1)]
left = image[j, max(i - 1, 0)]
average = (up + down + left + right + image[j, i]) / 5
image[j, i] = np.uint8(average)
You can easily do this with filter2D as shown in the example below. It will work on any number of channels.
im = np.random.randint(0, 256, (5, 5), np.uint8)
kernel = np.array([[0, 1./5, 0], [1./5, 1./5, 1./5], [0, 1./5, 0]])
filt = cv2.filter2D(im, cv2.CV_8U, kernel)
For example:
im
array([[ 14, 127, 221, 74, 2],
[132, 251, 88, 19, 215],
[183, 140, 17, 60, 76],
[208, 144, 182, 11, 64],
[183, 89, 217, 131, 23]], dtype=uint8)
filt
array([[106, 173, 120, 67, 116],
[166, 148, 119, 91, 66],
[161, 147, 97, 37, 95],
[172, 153, 114, 90, 37],
[155, 155, 160, 79, 83]], dtype=uint8)
You can choose the border type, I've used the default.

numpy with python: convert 3d array to 2d

Say that I have a color image, and naturally this will be represented by a 3-dimensional array in python, say of shape (n x m x 3) and call it img.
I want a new 2-d array, call it "narray" to have a shape (3,nxm), such that each row of this array contains the "flattened" version of R,G,and B channel respectively. Moreover, it should have the property that I can easily reconstruct back any of the original channel by something like
narray[0,].reshape(img.shape[0:2]) #so this should reconstruct back the R channel.
The question is how can I construct the "narray" from "img"? The simple img.reshape(3,-1) does not work as the order of the elements are not desirable for me.
Thanks
You need to use np.transpose to rearrange dimensions. Now, n x m x 3 is to be converted to 3 x (n*m), so send the last axis to the front and shift right the order of the remaining axes (0,1). Finally , reshape to have 3 rows. Thus, the implementation would be -
img.transpose(2,0,1).reshape(3,-1)
Sample run -
In [16]: img
Out[16]:
array([[[155, 33, 129],
[161, 218, 6]],
[[215, 142, 235],
[143, 249, 164]],
[[221, 71, 229],
[ 56, 91, 120]],
[[236, 4, 177],
[171, 105, 40]]])
In [17]: img.transpose(2,0,1).reshape(3,-1)
Out[17]:
array([[155, 161, 215, 143, 221, 56, 236, 171],
[ 33, 218, 142, 249, 71, 91, 4, 105],
[129, 6, 235, 164, 229, 120, 177, 40]])
[ORIGINAL ANSWER]
Let's say we have an array img of size m x n x 3 to transform into an array new_img of size 3 x (m*n)
Initial Solution:
new_img = img.reshape((img.shape[0]*img.shape[1]), img.shape[2])
new_img = new_img.transpose()
[EDITED ANSWER]
Flaw: The reshape starts from the first dimension and reshapes the remainder, this solution has the potential to mix the values from the third dimension. Which in the case of images could be semantically incorrect.
Adapted Solution:
# Dimensions: [m, n, 3]
new_img = new_img.transpose()
# Dimensions: [3, n, m]
new_img = img.reshape(img.shape[0], (img.shape[1]*img.shape[2]))
Strict Solution:
# Dimensions: [m, n, 3]
new_img = new_img.transpose((2, 0, 1))
# Dimensions: [3, m, n]
new_img = img.reshape(img.shape[0], (img.shape[1]*img.shape[2]))
The strict is a better way forward to account for the order of dimensions, while the results from the Adapted and Strict will be identical in terms of the values (set(new_img[0,...])), however with the order shuffled.
If you have the scikit module installed, then you can use the rgb2grey (or rgb2gray) to make a photo from color to gray (from 3D to 2D)
from skimage import io, color
lina_color = io.imread(path+img)
lina_gray = color.rgb2gray(lina_color)
In [33]: lina_color.shape
Out[33]: (1920, 1280, 3)
In [34]: lina_gray.shape
Out[34]: (1920, 1280)

Categories

Resources