ROI augmentation in keras: scipy.ndimage transformations - python

I've got an image with regions of interest.
I would like to apply random transformations to this image, while keeping the regions of interest correct.
My code is taking a list of boxes in this format [x_min, y_min, x_max, y_max].
It then converts the boxes into a list of vertices [up_left, up_right, down_right, down_left] for every box. This is a list of vectors. So I can apply the transformation to the vectors.
The next step is looking for the new [x_min, y_min, x_max, y_max] in the list of transformed vertices.
My first application was rotations, they work fine:
Here is the corresponding code. The first part is taken from the keras codebase, scroll down to the NEW CODE comment.
If I get the code to work, I would be interested in integrating it into keras. So I'm trying to integrate my code into their image preprocessing infrastructure:
def random_rotation_with_boxes(x, boxes, rg, row_axis=1, col_axis=2, channel_axis=0,
fill_mode='nearest', cval=0.):
"""Performs a random rotation of a Numpy image tensor.
Also rotates the corresponding bounding boxes
# Arguments
x: Input tensor. Must be 3D.
boxes: a list of bounding boxes [xmin, ymin, xmax, ymax], values in [0,1].
rg: Rotation range, in degrees.
row_axis: Index of axis for rows in the input tensor.
col_axis: Index of axis for columns in the input tensor.
channel_axis: Index of axis for channels in the input tensor.
fill_mode: Points outside the boundaries of the input
are filled according to the given mode
(one of `{'constant', 'nearest', 'reflect', 'wrap'}`).
cval: Value used for points outside the boundaries
of the input if `mode='constant'`.
# Returns
Rotated Numpy image tensor.
And rotated bounding boxes
"""
# sample parameter for augmentation
theta = np.pi / 180 * np.random.uniform(-rg, rg)
# apply to image
rotation_matrix = np.array([[np.cos(theta), -np.sin(theta), 0],
[np.sin(theta), np.cos(theta), 0],
[0, 0, 1]])
h, w = x.shape[row_axis], x.shape[col_axis]
transform_matrix = transform_matrix_offset_center(rotation_matrix, h, w)
x = apply_transform(x, transform_matrix, channel_axis, fill_mode, cval)
# -------------------------------------------------
# NEW CODE FROM HERE
# -------------------------------------------------
# apply to vertices
vertices = boxes_to_vertices(boxes)
vertices = vertices.reshape((-1, 2))
# apply offset to have pivot point at [0.5, 0.5]
vertices -= [0.5, 0.5]
# apply rotation, we only need the rotation part of the matrix
vertices = np.dot(vertices, rotation_matrix[:2, :2])
vertices += [0.5, 0.5]
boxes = vertices_to_boxes(vertices)
return x, boxes, vertices
As you can see, they are using scipy.ndimage to apply the transformation to an image.
My bounding boxes have coordinates in [0,1], the center is [0.5, 0.5]. The rotation needs to be applied around [0.5, 0.5] as pivot point. It would be possible to use homogenous coordinates and matrices to shift, rotate and shift the vectors. That's what they do for the image.
There is an existing transform_matrix_offset_center function, but that offsets to float(width)/2 + 0.5. The +0.5 makes this unsuitable for my coordinates in [0, 1]. So I'm shifting the vectors myself.
For rotations, this code works fine.
I thought it would be generally applicable.
But for zooming, this fails in a strange way. The code is pretty much identical:
vertices -= [0.5, 0.5]
# apply zoom, we only need the zoom part of the matrix
vertices = np.dot(vertices, zoom_matrix[:2, :2])
vertices += [0.5, 0.5]
The output is this:
There seems to be a variety of problems:
The shifting is broken. In image 1, the ROI and the corresponding image part almost don't overlap
The coordinates seem to be switched. In image 2, the ROI and the image seem to be scaled differently along x and y axes.
I tried switching the axes by using (zoom_matrix[:2, :2].T)[::-1, ::-1].
That leads to this:
Now the scale factor is broken? I've tried many different variations on this matrix multiplication, transposing, mirroring, changing the scale factors, etc. I can't seem to get it right.
And in any case, I think that the original code should be correct. It works for rotations, after all.
At this point, I'm thinking whether this is a peculiarity of scipy's ndimage resampling ?
Is this a mistake in my math, or is something missing to truly emulate the scipy ndimage resampling ?
I've put the full source code on pastebin.
Only small parts are updated by me, actually this is code from keras:
https://pastebin.com/tsHnLLgy
The code to use the new augmentations and create these images is here:
https://nbviewer.jupyter.org/gist/lhk/b8f30e9f30c5d395b99188a53524c53e
UPDATE:
If the zoom factors are inverted, the transformation works.
For zoom, this operation is simple and can be expressed as:
# vertices is an array of shape [number of vertices, 2]
vertices *= [1/zx, 1/zy]
This corresponds to applying the inverse transform to the vertices.
In the context of image resampling, this could make sense. An image could be resampled like this
create a coordinate vector for every pixel.
apply the inverse transform to every vector
interpolate the original image to find the value the vector is now pointing at
write this value to the output image at the original position
But for rotation, I didn't invert the matrix and the operation worked correctly.
The question itself, how to fix this, seems answered. But I don't understand why.

Related

fitting ellipse in images with poor contrast

I am working on image processing in Python, on the topic of underwater photogrammetry. My goal is to fit an ellipse to fidicual markers, and retrieve its a) center, b) axis and c) orientation.
My markers are
radial,
white on black background, and some have a
binary code:
A ML-model delivers a small image snippets for each marker in each image, containting only the center of the marker.
So far, I've implemented these approaches:
Using openCV:
a) Thresholding, which results in a binary image (cv2.threshold)
b) Find Contours (cv2.findContours)
c) fit Ellipse (v2.fitEllipse)
Using Scikit:
a) Detect edge (using Canny)
b) Apply hough transform
Star operator (work in progress)
a) Estimate ellipse center
b) Send 360rays in all directions
c) Build an array, comprising coordinates of the largest gradient on each ray
d) Calculate best-fit ellipse using least-square method
e) Use the new center to repeat process (possibly several iterations required)
I perform these methods for each color-channel seperately. So far, the results between channels differ within several pixels for the ellipse center.
Do you have any suggestions on what pre-processing methods I should use, prior detecting/fitting the ellipse?
Any thoughts on which of the above methods will lead to the most accurate results?
This is amazing! Thank you. I just started to read about moments (e.g. https://www.pythonpool.com/opencv-moments/) and inertia.
However, there is a challange applying your code to this example:
As you can see, the image was poorly cropped, and the inertia of the image is more in the image center than in the center of the expected ellipse.
My first attempt to fix this is to binarize the image first:
import cv2 as cv2
T = int(cv2.mean(image)[0])
ret,image = cv2.threshold(image,T,255,0)
Is that a reasonable approach? I fear, that the binarization will have an unwanted impact on the moments of inertia. Thank you for claryfying.
This code finds the center of mass of the image, and the main axis of symmetry by calculating the moments of inertia.
I tried many libraries that calculate moments of inertia of images, but they give strange results (like 4x4 matrix for what should be a 2x2 matrix of inertia.
Also, ndimage.measurements.center_of_mass() appears to return (Cy,Cx) (row, column)
So, I resorted to manually calculating the moments of inertia
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image as Pim
from io import BytesIO
import requests
photoURL = "https://i.stack.imgur.com/EcLYk.png"
response = requests.get(photoURL)
image = np.array(Pim.open(BytesIO(response.content)).convert('L')) # Convert to greyscale
plt.imshow(image)
if True: # calculate eigen vectors = main axis of inertia
# xCoord, yCoord are the column and row numbers in image
xCoord, yCoord = np.meshgrid(np.arange(image.shape[1]), np.arange(image.shape[0]))
# mass M is the total sum of Image
M = np.sum(image)
# Cx, Cy are the coordinates of the center of mass
#Cx = sum(xCoord * image) / sum(image)
Cx = np.einsum('ij,ij', xCoord, image)/M
Cy = np.einsum('ij,ij', yCoord, image)/M
# Ixx is the second order moment of image respect to the horizontal axis passing through the center of mass
# Ixx=sum(Image*y^2)
Ixx = np.einsum('ij,ij,ij', yCoord-Cy, yCoord-Cy, image)
# Iyy is the second order moment of image respect to the vertical axis passing through the center of mass
# Iyy=sum(Image*x^2)
Iyy = np.einsum('ij,ij,ij', xCoord-Cx, xCoord-Cx, image)
# Ixy is the second order moment of image respect to both axis passing through the center of mass
# Ixy=sum(Image*x*y)
Ixy = np.einsum('ij,ij,ij', xCoord-Cx, yCoord-Cy, image)
inertiaMatrix = np.array([[Ixx, Ixy],
[Ixy, Iyy]])
eigValues, eigVectors = np.linalg.eig(inertiaMatrix)
# Plot center of mass
plt.scatter(Cx, Cy, c='r')
# Plot eigenvectors from center to direction of eigenvectors
plt.quiver(Cx, Cy, eigVectors[0, 0], eigVectors[1, 0], color='r', scale=2)
plt.quiver(Cx, Cy, eigVectors[0, 1], eigVectors[1, 1], color='r', scale=2)
plt.show()
nothing = 0

OpenCv Shape Detection To Shape Transformation (Pythhon)

How can I take a image of a square that has been detected using shape detection algorithm on openCV and "Transform" it to a triangle the quickest way possible?
For EXAMPLE say one of the images from google is a square and i want to see the fastsest way to turn it to a triangle. How would I go about researching this? I have looked up shape transformation for openCV but it mostly covers zooming in on the image and changing views.
One way to distort a rectangle to a triangle is to use a perspective transformation in Python/OpenCV.
Read the input
Get the input control points as the 4 corners of the input image
Define the output control points as to top 2 points close to the top center of the output (-+1 or whatever separation you want) and bottom 2 points the same as the input bottom two points
Compute the perspective transformation matrix from the control points
Warp the input to the output
Save the result.
Input:
import cv2
import numpy as np
# Read source image.
img_src = cv2.imread('lena.png')
hh, ww = img_src.shape[:2]
# Four corners of source image ordered clockwise from top left corner
# Coordinates are in x,y system with x horizontal to the right and y vertical downward
pts_src = np.float32([[0,0], [ww-1,0], [ww-1,hh-1], [0,hh-1]])
# Four corners of destination image.
pts_dst = np.float32([[ww/2-1, 0], [ww/2+1,0], [ww-1,hh-1], [0,hh-1]])
# Get perspecive matrix if only 4 points
m = cv2.getPerspectiveTransform(pts_src,pts_dst)
# Warp source image to destination based on matrix
img_out = cv2.warpPerspective(img_src, m, (ww,hh), cv2.INTER_LINEAR, borderMode=cv2.BORDER_CONSTANT, borderValue=(255, 255, 255))
# Save output
cv2.imwrite('lena_triangle.png', img_out)
# Display result
cv2.imshow("Warped Source Image", img_out)
cv2.waitKey(0)
cv2.destroyAllWindows()
Result (though likely not what you want).
If you separate the top two output points by a larger difference, then it will look more like the input.
For example, using ww/2-10 and ww/2+10 rather than ww/2-1 and ww/2+1 for the top two output points, I get:

Image masks (polygons) need to be fed into a grid and obtain the percentage of the grid mixels covered by the polygon

I have a mask in .ply format that I am working with using pymangle quite happily. However, I want to plug all the polygons in the mask to a 2x2 grid with 1/100 or so subpixels so that I get the percentage of coverage of each grid pixel due to the mask. I do not know how to approach this. Is it the same as the mask weights?
For each pixel I should get a value ranging from 0 to 1 depending on how much of the pixel is covered by the mask.
SOLVED using pymangle over numpy meshgrid.
pymangle has a 'weight' function that can be used to get the weight of the mask in a coordinate point (0 if the mask doesnt exit there).
mask = pymangle.Mangle('/path/to/file.ply')
M = mask.weight(ra,dec)
M.reshape(ra.shape)
#Here ra,dec is a numpy meshgrid

How can I select the pixels that fall within a contour in an image represented by a numpy array?

VI have a set of contour points drawn on an image which is stored as a 2D numpy array. The contours are represented by 2 numpy arrays of float values for x and y coordinates each. These coordinates are not integers and do not align perfectly with pixels but they do tell you the location of the contour points with respect to pixels.
I would like to be able to select the pixels that fall within the contours. I wrote some code that is pretty much the same as answer given here: Access pixel values within a contour boundary using OpenCV in Python
temp_list = []
for a, b in zip(x_pixel_nos, y_pixel_nos):
temp_list.append([[a, b]]) # 2D array of shape 1x2
temp_array = np.array(temp_list)
contour_array_list = []
contour_array_list.append(temp_array)
lst_intensities = []
# For each list of contour points...
for i in range(len(contour_array_list)):
# Create a mask image that contains the contour filled in
cimg = np.zeros_like(pixel_array)
cv2.drawContours(cimg, contour_array_list, i, color=255, thickness=-1)
# Access the image pixels and create a 1D numpy array then add to list
pts = np.where(cimg == 255)
lst_intensities.append(pixel_array[pts[0], pts[1]])
When I run this, I get an error error: OpenCV(3.4.1) /opt/conda/conda-bld/opencv-suite_1527005509093/work/modules/imgproc/src/drawing.cpp:2515: error: (-215) npoints > 0 in function drawContours
I am guessing that at this point openCV will not work for me because my contours are floats, not integers, which openCV does not handle with drawContours. If I convert the coordinates of the contours to integers, I lose a lot of precision.
So how can I get at the pixels that fall within the contours?
This should be a trivial task but so far I was not able to find an easy way to do it.
I think that the simplest way of finding all pixels that fall within the contour is as follows.
The contour is described by a set of non-integer points. We can think of these points as vertices of a polygon, the contour is a polygon.
We first find the bounding box of the polygon. Any pixel outside of this bounding box is not inside the polygon, and doesn't need to be considered.
For the pixels inside the bounding box, we test if they are inside the polygon using the classical test: Trace a line from some point at infinity to the point, and count the number of polygon edges (line segments) crossed. If this number is odd, the point is inside the polygon. It turns out that Matplotlib contains a very efficient implementation of this algorithm.
I'm still getting used to Python and Numpy, this might be a bit awkward code if you're a Python expert. But it is straight-forward what it does, I think. First it computes the bounding box of the polygon, then it creates an array points with the coordinates of all pixels that fall within this bounding box (I'm assuming the pixel centroid is what counts). It applies the matplotlib.path.contains_points method to this array, yielding a boolean array mask. Finally, it reshapes this array to match the bounding box.
import math
import matplotlib.path
import numpy as np
x_pixel_nos = [...]
y_pixel_nos = [...] # Data from https://gist.github.com/sdoken/173fae1f9d8673ffff5b481b3872a69d
temp_list = []
for a, b in zip(x_pixel_nos, y_pixel_nos):
temp_list.append([a, b])
polygon = np.array(temp_list)
left = np.min(polygon, axis=0)
right = np.max(polygon, axis=0)
x = np.arange(math.ceil(left[0]), math.floor(right[0])+1)
y = np.arange(math.ceil(left[1]), math.floor(right[1])+1)
xv, yv = np.meshgrid(x, y, indexing='xy')
points = np.hstack((xv.reshape((-1,1)), yv.reshape((-1,1))))
path = matplotlib.path.Path(polygon)
mask = path.contains_points(points)
mask.shape = xv.shape
After this code, what is necessary is to locate the bounding box within the image, and color the pixels. left contains the pixel in the image corresponding to the top-left pixel of mask.
It is possible to improve the performance of this algorithm. If the ray traced to test a pixel is horizontal, you can imagine that all the pixels along a horizontal line can benefit from the work done for the pixels to the left. That is, it is possible to compute the in/out status for all pixels on an image line with a little bit more effort than the cost for a single pixel.
The matplotlib.path.contains_points algorithm is much more efficient than performing a single-point test for all points, since sorting the polygon edges and vertices appropriately make each test much cheaper, and that sorting only needs to be done once when testing many points at once. But this algorithm doesn't take into account that we want to test many points on the same line.
These are what I see when I do
pp.plot(x_pixel_nos, y_pixel_nos)
pp.imshow(mask)
after running the code above with your data. Note that the y axis is inverted with imshow, hence the vertically mirrored shapes.
With Help of Shapely library in python, it can easily be done as:
from shapely.geometry import Point, Polygon
Convert all the x,y coords to shapely Polygons as:
coords = [(0, 0), (0, 2), (1, 1), (2, 2), (2, 0), (1, 1), (0, 0)]
pl = Polygon(coords)
Now find pixels in each of polygon as:
minx, miny, maxx, maxy = pl.bounds
minx, miny, maxx, maxy = int(minx), int(miny), int(maxx), int(maxy)
box_patch = [[x,y] for x in range(minx,maxx+1) for y in range(miny,maxy+1)]
pixels = []
for pb in box_patch:
pt = Point(pb[0],pb[1])
if(pl.contains(pt)):
pixels.append([int(pb[0]), int(pb[1])])
return pixels
Put this loop for each set of coords and then for each polygons.
good to go :)
skimage.draw.polygon can handle this 1, see the example code of this function on that page.
If you want just the contour, you can do skimage.segmentation.find_boundaries 2.

Render image on the surface with Mayavi and Python

I am trying to create visualization with use of Python and Mayavi.
The purpose of that visualization is to show a trajectory and camera frustums at different stages of the path.
The thing I struggle with is to texturize camera frustum polygons with an actual images.
I am willing to put performance considerations aside for now, and want to find a way to texture a mayavi-created surface with an image provided by numpy.
The most promising suggestions were found there, yet I was unable to construct a surface as I implemented them.
def render_image(self, frustum, timestamp):
surf = mayavi.mlab.surf(frustum[0, :-1],
frustum[1, :-1],
frustum[2, :-1],
color = (1.0, 1.0, 1.0))
That's the code for surface creation, where rows of the numpy array frustum are x, y, z coordinates respectively and the last, fifth point is the tip of pyramid and hence not needed for mesh.
x [-8.717184671492793, -8.623419637172622, -8.363581977642212, -8.269816943322041]
y [-4.563044562134721, -4.941612408713827, -4.37100415350352, -4.749572000082626]
z [13.614485323873417, 13.703336344550703, 14.059553426925493, 14.148404447602779]
That is an example of function input - four 3D points representing vertices of a desired polygon.
Yet, the surf function fails on that input:
File "/usr/local/lib/python2.7/dist-packages/mayavi/tools/helper_functions.py", line 679, in __call_internal__
aspect_ratios = [(zf - zi) / (xf - xi), (zf - zi) / (yf - yi)]
ZeroDivisionError: float division by zero
Note: I was able to render images with mayavi.mlab.imshow, but I find it error-prone and onerous to specify image pose and size in terms of axis angles and scale vectors, so I'm reluctant to accept answers pointing to that direction.
Your help is greatly appreciated.
I got to draw textured cameras with mayavi!
see:
Although the way I've done it is using mlab.imshow, so it maybe this is the type of answer you don't want. See this code:
obj=mlab.imshow(image.T)
obj.actor.orientation = np.rad2deg(camera.w_Rt_c.euler)
pp = np.array([0, 0, camera.f])[:,None]
w_pp = camera.w_Rt_c.forward(pp)
obj.actor.position = w_pp.ravel()
obj.actor.scale = [0.8, 0.8, 0.8]
image is a (n,m) numpy array, for some reason imshow would show the image 90 degrees rotated, that's why I transpose it.
obj.actor.orientation expects a yaw, pitch, roll angles is degrees. The rotation of the image is the product of individual rotation matrices Rx(yaw)*Ry(pitch)*Rz(roll). In the code I use the camera to world euler angles of my camera class (can't share that code at the moment).
The position of the image is set to the 3d position where the principal point of my camera would be transformed to world coordinates.
Why the scale factor is 0.8 is a mystery, if I leave it to 1 the image plane appear larger than the frustum???
I encapsulate the above in a class that expects a camera and an image and draws the frustum and the image at the position and orientation of the given camera.

Categories

Resources