Set opencv images/numpy array values using an array of pixels - python

Attempting to do forward warping of a homography matrix in OpenCV. You don't have to know what that means to understand the issue though.
Assume there are 2 images (an image is a 2D Numpy array of pixel values), A and B, and an array match that looks like
[[ 6.96122642e+01 -1.06556338e+03 1.02251944e+00]
[ 6.92265938e+01 -1.06334423e+03 1.02246589e+00]
[ 6.88409234e+01 -1.06112508e+03 1.02241234e+00]
... ]
The first column is X, second Y, and third is a scalar. These XY values are image A pixel indices and correspond to the imageB indexes
[[0,0],
[0,1],
[0,2]
... ]
I want to use this info to quickly set imageB values from imageA. I have this working but it is not as fast as I'd like
yAs = np.int32(np.round( match[:, 0] / match[:, 2] )
xAs = np.int32(np.round( match[:, 1] / match[:, 2] )
it = np.nditer(pixelsImageB[0], flags=['f_index'])
while not it.finished:
i = it.index
xA = xAs[i]
yA = yAs[i]
if in_bounds(xA, yA, imageA):
yB = pixB[0][i]
xB = pixB[1][i]
imageB[xB,yB] = imageA[xA,yA]
it.iternext()
But I'm not sure how to make this fast in Numpy, naively doing this loop is very slow. I'm a total scrub at advanced indexing, broadcasting, and the like. Any ideas?

The fastest way would be to not reinvent the wheel and use cv.WarpPerspective function.
Alternatively, you can use Pillow Image.transform method which according to docs has slight advantage over OpenCV in that it also supports bicubic interpolation, which should produce output of better quality.

Related

Optimize this linear transformation for images with Numpy

Good evening,
I'm trying to learn NumPy and have written a simple Linear transformation that applies to an image using for loops:
import numpy as np
M = np.array([
[width, 0],
[0, height]
])
T = np.array([
[1, 3],
[0, 1]
])
def transform_image(M, T):
T_rel_M = abs(M # T)
new_img = np.zeros(T_rel_M.sum(axis=1).astype("int")).T
for i in range(0, 440):
for j in range(0, 440):
x = np.array([j, i])
coords = (T # x)
x = coords[0]
y = coords[1]
new_img[y, -x] = image[i, -j]
return new_img
plt.imshow(transform_image(M, T))
It does what I want and spits out a transformation that is correct, except that I think there is a way to do this without the loops.
I tried doing some stuff with meshgrid but I couldn't figure out how to get the pixels from the image in the same way I do it in the loop (using i and j). I think I figured out how to apply the transformation but then getting the pixels from the image in the correct spots wouldn't work.
Any ideas?
EDIT:
Great help with below solutions, lezaf's solution was very similar to what I tried before, the only step missing that I couldn't figure out was assigning the pixels from the old to the new image. I made some changes to the code to exclude transposing, and also added a astype("int") so it works with float values in the T matrix:
def transform_image(M, T):
T_rel_M = abs(M # T)
new_img = np.zeros(T_rel_M.sum(axis=1).astype("int")).T
x_combs = np.array(np.meshgrid(np.arange(width), np.arange(height))).reshape(2,-1)
coords = (T # x_combs).astype("int")
new_img[coords[1, :], -coords[0, :]] = image[x_combs[1, :], -x_combs[0, :]]
return new_img
A more efficient solution is the following:
def transform_image(M, T):
T_rel_M = abs(M # T)
new_img = np.zeros(T_rel_M.sum(axis=1).astype("int")).T
# This one replaces the double for-loop
x_combs = np.array(np.meshgrid(np.arange(440), np.arange(440))).T.reshape(-1,2)
# Calculate the new coordinates
coords = (T#x_combs.T)
# Apply changes to new_img
new_img[coords[1, :], -coords[0, :]] = image[x_combs[:, 1], -x_combs[:,0]]
I updated my solution removing the for-loop, so now is a lot more straightforward.
After this change, the time of the optimized code is 50 ms compared to the initial 3.06 s of the code in question.
There seems to have some confusions between width/height, x/y, ... so not 100% my code won't need adaptation. But I think, the main idea is the one you are looking for
def transform_image(M, T):
T_rel_M = abs(M # T)
j,i=np.meshgrid(range(width), range(height))
ji=np.array((j.flatten(), i.flatten()))
coords = (T#ji).astype(int)
new_img=np.zeros((coords[1].max()+1, coords[0].max()+1), dtype=np.uint8)
new_img[coords[1], coords[0]] = image.flatten()
The main idea here is to build a set of coordinates of the input image with meshgrid. I don't want a 2d-array of coordinates. Just a list of coordinates (a list of pairs i,j). Hence the flatten. So ji is a huge 2×N array, N being the number of pixels (so width×height).
coords is the transformation of all those coordinates.
Since your original code seemed to have some inconsistency with size (the rotated image did not fit in the new_img), I choose the easy way to compute the size of new_img, and just compute the max of those coordinates (a bit overkill: the max of the four corners would be enough)
And then, I use this set of coordinates as indexes for new_img, to which I affect the matching image, that is image flatten
So, no for loop at all.
(Note that I've dropped the -x thing also. Just because I struggled to understand. I could have putted it back now that I have a working solution. But I am not 100% sure if it wasn't there because you also tried/errored some strange adjustment. But anyway, I think what you were looking for is how to use meshgrid to create a set of coordinates and process them without loop. Even if you may need to adapt my solution, you have it: flatten the coordinates of meshgrid, transform them with a matrix multiplication, and use them as index for places of all pixels of the original image)
Edit : variant
def transform_image(M, T):
T_rel_M = abs(M # T)
ji=np.array(np.meshgrid(range(width), range(height)))
coords = np.einsum('ik,kjl', T, ji).astype(int)
new_img=np.zeros((max(coords[1,0,-1],coords[1,-1,0], coords[1,-1,-1])+1, max(coords[0,0,-1], coords[0,-1,0], coords[0,-1,-1])+1), dtype=np.uint8)
new_img[coords[1].flatten(), coords[0].flatten()] = image.flatten()
return new_img
The idea is the same. But instead of flattening directly ji original coordinates, I keep them as is. Then use einsum to perform a matrix multiplication on a 3D array (which returns also a 2d 2×width×height arrays, whose each [:,j,i] value is just the transformation of [j,i]. So, it is just the same as previous #, except that it works even if, instead of having a 2×N set of coordinates we have a 2×width×height one).
Which has 2 advantages
Apparently it is sensibly faster to create ji than way
It allows the usage of just corners to find the size of the new image, as I've mentioned before (that was more difficult when coords was flatten from its creation).
Timing
Solution
Timing
Yours
4.5 s
lezaf's
3.2 s
This one
49 ms
The variant
41 ms

How to optimize numpy operation, applying 2D condition on 3 channel RGB image?

I am trying to apply a computation from 2D alpha image to 3 channeled RGB image. I need to update pixel intensity in each channel, based on respective pixel value in 2D alpha image. Below is one MWE I created to illustrate the concept.
MWE:
# test alpha 2D image
test_a1 = np.array([
[0, 0, 50],
[0, 0, 150],
[0, 0, 225]
])
# test 3 channel RGB image
test_ir1 = np.ones((3,3,3))
# getting indices of alpha where cond is satisfied
idx = np.unravel_index(np.where(test_a1.ravel()>0),test_a1.shape)
test_output = np.zeros_like(test_ir1)
n_idx = len(idx[0][0])
# applying computation on 3 channel RGB image only where cond is satisfied.
for i in range(n_idx):
# multiply only where test_a1 > 0
r_idx, c_idx = idx[0][0][i], idx[1][0][i]
test_output[r_idx,c_idx,0] = test_a1[r_idx, c_idx] * test_ir1[r_idx, c_idx, 0]
test_output[r_idx,c_idx,1] = test_a1[r_idx, c_idx] * test_ir1[r_idx, c_idx, 1]
test_output[r_idx,c_idx,2] = test_a1[r_idx, c_idx] * test_ir1[r_idx, c_idx, 2]
test_output = test_output.astype('uint8')
plt.imshow(test_output, vmin=0, vmax=3)
output:
I basically tried to find the indices in 2D alpha image, where condition is met, and tried to apply those indices to all channels of the image.
Is there a way to optimize above operation (not for channel looping)? I am specifically looking to avoid the for loop in the code, doing numpy for each index. It is ver slow for regular images.
You might observe that:
test_output = np.einsum('ij,ijk->ijk', test_a1, test_ir1)
Not sure if this helps reformulate a slightly different MWE if this isnt precisely what you intend
=== edited ===
I would still make use of einsum since it allows you a lot of control of vectorised multidimensional linear algebra.
Provided you can reduce your operations to some mathematical description, for example:
Where test_a1 is greater than zero, double the intensity of the pixels measured over each channel.
Then you do that in the following way:
mask = test_a1 > 0
output = np.einsum('ij,ijk->ijk', mask, test_ir1) + test_ir1

Numpy remove a dimension from np array

I have some images I want to work with, the problem is that there are two kinds of images both are 106 x 106 pixels, some are in color and some are black and white.
one with only two (2) dimensions:
(106,106)
and one with three (3)
(106,106,3)
Is there a way I can strip this last dimension?
I tried np.delete, but it did not seem to work.
np.shape(np.delete(Xtrain[0], [2] , 2))
Out[67]: (106, 106, 2)
You could use numpy's fancy indexing (an extension to Python's built-in slice notation):
x = np.zeros( (106, 106, 3) )
result = x[:, :, 0]
print(result.shape)
prints
(106, 106)
A shape of (106, 106, 3) means you have 3 sets of things that have shape (106, 106). So in order to "strip" the last dimension, you just have to pick one of these (that's what the fancy indexing does).
You can keep any slice you want. I arbitrarily choose to keep the 0th, since you didn't specify what you wanted. So, result = x[:, :, 1] and result = x[:, :, 2] would give the desired shape as well: it all just depends on which slice you need to keep.
if you have multiple dimensional this might help
pred_mask[0,...] #Remove First Dim
Pred_mask[...,0] #Remove Last Dim
Just take the mean value over the colors dimension (axis=2):
Xtrain_monochrome = Xtrain.mean(axis=2)
When the shape of your array is (106, 106, 3), you can visualize it as a table with 106 rows and 106 columns filled with data points where each point is array of 3 numbers which we can represent as [x, y ,z]. Therefore, if you want to get the dimensions (106, 106), you must make the data points in your table of to not be arrays but single numbers. You can achieve this by extracting either the x-component, y-component or z-component of each data point or by applying a function that somehow aggregates the three component like the mean, sum, max etc. You can extract any component just like #matt Messersmith suggested above.
well, you should be careful when you are trying to reduce the dimensions of an image.
An Image is normally a 3-D matrix that contains data of the RGB values of each pixel. If you want to reduce it to 2-D, what you really are doing is converting a colored RGB image into a grayscale image.
And there are several ways to do this like you can take the maximum of three, min, average, sum, etc, depending on the accuracy you want in your image. The best you can do is, take a weighted average of the RGB values using the formula
Y = 0.299R + 0.587G + 0.114B
where R stands for RED, G is GREEN and B is BLUE. In numpy, this can be written as
new_image = img[:, :, 0]*0.299 + img[:, :, 1]*0.587 + img[:, :, 2]*0.114
Actually np.delete would work if you would apply it two times,
if you want to preserve the first channel for example then you could run the following:
Xtrain = np.delete(Xtrain,2,2) # this will get rid of the 3rd component of the 3 dimensions
print(Xtrain.shape) # will now output (106,106,2)
# again we apply np.delete but on the second component of the 3rd dimension
Xtrain = np.delete(Xtrain,1,2)
print(Xtrain.shape) # will now output (106,106,1)
# you may finally squeeze your output to get a 2d array
Xtrain = Xtrain.squeeze()
print(Xtrain.shape) # will now output (106,106)

Pythonic way to create a numpy array of coordinates

I'm trying to create a numpy array of coordinates. Up until now, I've been just using x_coords, y_coords = numpy.indices((shape)). Now, however, I want to combine x_coords and y_coords into one array, such that x_coords = thisArray[:,:,0] and y_coords = thisArray[:,:,1] In this case, thisArray is a two-dimensional array. Is there a simple or pythonic way to do this?
I originally thought about using numpy.outer, but that doesn't quite give me what I need. A possible idea is just using concatenation of the indices array along the (2nd?) axis, but that doesn't seem like a very elegant solution. (it may be the cleanest one here though).
Thanks!
What np.indices returns is already an array, but x_coords = thisArray[0, :, :] and y_coords = thisArray[1, :, :]. Unless you have very strict requirements for your array of coordinates (namely that it be contiguous), you can take a view of that array with the first axis rolled to the end:
thisArray = numpy.rollaxis(numpy.indices(shape), 0, len(shape)+1)

Fancier Fancy Indexing in NumPy?

I am implementing color interpolation using a look-up-table (LUT) with NumPy. At one point I am using the 4 most significant bits of RGB values to choose corresponding CMYK values from a 17x17x17x4 LUT. Right now it looks something like this:
import numpy as np
rgb = np.random.randint(16, size=(3, 1000, 1000))
lut = np.random.randint(256, size=(17, 17, 17, 4))
cmyk = lut[rgb[0], rgb[1], rgb[2]]
Here comes the first question... Is there no better way? It sort of seems natural that you could tell NumPy that the indices for lut are stored along axis 0 of rgb, without having to actually write it out. So is there anything like cmyk = lut.fancier_take(rgb, axis=0) in NumPy?
Furthermore, I am left with an array of shape (1000, 1000, 4), so to be consistent with the input, I need to rotate it all around using a couple of swapaxes:
cmyk = cmyk.swapaxes(2, 1).swapaxes(1, 0).copy()
And I also need to add the copy statement, because if not the resulting array is not contiguous in memory, and that brings trouble later on.
Right now I am leaning towards rotating the LUT before the fancy indexing and then do something along the lines of:
swapped_lut = lut.swapaxes(2, 1).swapaxes(1, 0)
cmyk = swapped_lut[np.arange(4), rgb[0], rgb[1], rgb[2]]
But again, it just does not seem right... There has to be a more elegant way to do this, right? Something like cmyk = lut.even_fancier_take(rgb, in_axis=0, out_axis=0)...
I'd suggest using tuple to force indexing rowwise, and np.rollaxis or transpose instead of swapaxes:
lut[tuple(rgb)].transpose(2, 0, 1).copy()
or
np.rollaxis(lut[tuple(rgb)], 2).copy()
To roll the axis first, use:
np.rollaxis(lut, -1)[(Ellipsis,) + tuple(rgb)]
You'll need to do the following if you swap lut, np.arange(4) will not work:
swapped_lut = np.rollaxis(lut, -1)
cmyk = swapped_lut[:, rgb[0], rgb[1], rgb[2]].copy()
Or you can replace
cmyk = lut[rgb[0], rgb[1], rgb[2]]
cmyk = cmyk.swapaxes(2, 1).swapaxes(1, 0).copy()
with:
cmyk = lut[tuple(rgb)]
cmyk = np.rollaxis(cmyk, -1).copy()
But to try and do it all in one step, ... Maybe:
rng = np.arange(4).reshape(4, 1, 1)
cmyk = lut[rgb[0], rgb[1], rgb[2], rng]
That's not very readable at all is it?
Take a look at the answer to this question, Numpy multi-dimensional array indexing swaps axis order. It does a good job of explaining how numpy broadcasts multiple arrays to get the output size. Here you want to create indices into lut that broadcast to (4, 1000, 1000). Hope that makes some sense.

Categories

Resources