Object detection: almost identical objects - python

I am struggling to get my KNN object detection script to tell the difference between two almost identical objects.
The obvious solution, which I am doing now is to add more reference images.
I was wondering if there is anything else I could do to easier distinguish between the two objects. They are both beverage containers with the same shape and colors, where the first has a small blueish tint and the other is just plain white with a shiny finish.
Right now I am feeding the image of my objects to VGG19 and capture the feature map output of the 1x4096 flatten layer and adding to this the five most dominant colors in RGB format along the the pixel height and width of the image crop of the object. This gives me a 4113 length one dimensional vector on which I can perform the KNN search.
I am curious if things like saturation and contrast on the image will help the KNN search finding the small differences or if there is something else I could try?
Best regards
Martin

Related

Apply bundle adjustment to rectify images globally in the context of image stitching in python/openCV

I am trying to perform image registration on potentially hundreds of aerial images taken from a camera mounted on a UAV. I think it is safe to assume that I know the ordering of the images, and hopefully, sequential images will overlap.
I have read some papers that suggest using a CNN to find the homography matrix can vastly outperform the old school feature descriptor matching with RANSAC song and dance. My issue is that I don't quite understand how to stitch more than 2 images together. It seems to me that to register image 100 in the same coordinate frame as image 1 using the cv2.warpPerspective function, I would do I100H1H2*H3...H99. Even if the error in each transform is small after 100 applications it seems like it would be huge. My understanding is that the solution to this problem is bundle adjustment.
I have looked into bundle adjustment a little bit but Im struggling to see how exactly I can use it. I have read the paper that many related stack overflow posts suggest "Automatic Panoramic Image Stitching using Invariant Features". In the section on bundle adjustment IF I understand the authors suggest that after building the initial panorama it is likely that image A will eventually overlap with multiple other images. Using the matched feature points in any images that overlap with A they basically calculate some adjustment...? I think to image A?
My question is using openCV how do I apply this adjustment? Let's say I have 3 images I1, I2, I3 all overlapping for a minimal example.
#assuimg CNN model predicts transform
#I think the first step is find the homography between all images
H12 = cnnMod.predict(I1,I2)
H13 = cnnMod.predict(I1,I3)
H23 = cnnMod.predict(I2,I3)
outI2 = cv2.warpPerspective(I2,H12,(maxWidth, maxHeight),flags=cv2.INTER_LINEAR)
outI3 = cv2.warpPerspective(I2,H23,(maxWidth, maxHeight),flags=cv2.INTER_LINEAR)
#now would I do some bundle voodoo?
#what would it look like?
#which of the bundler classes should I use?
#would it look like this?
#or maybe the input is features?
voodoo = cv2.bundleVoodoo([H12,H13,H23])
golaballyRectifiedI2 = cv2.warpPerspective(outI2,voodoo[2],(maxWidth, maxHeight),flags=cv2.INTER_LINEAR)
The code is my best guess at what a solution might look like but clearly I have no idea what I am doing. I've not been able to find anything that actually shows how the bundle adjustment is done.
The basic idea underlying image alignment through bundle adjustment is that, rather than matching pairs of 2D points (x, x') across pairs of images, you posit the existence of 3d points X that, ideally, project onto matched tuples of 2D points (x, x', x'', ...) matched among corresponding tuples of images. You then solve for the location of the X's and the camera parameters (extrinsics, and intrinsics if the camera is uncalibrated) that minimize the (robustified, usually) RMS reprojection error over all 2d points and images.
Depending on your particular setup and scene, you may make some simplifying assumptions, e.g.:
That the X's all belong to the same plane (which you can arbitrarily choose as the world's Z=0 plane). This is useful, for example, when stitching images of a painting, or aerial images on relatively flat ground with relatively small extent so one can ignore the earth's curvature.
Or that the X's are all on the WGS84 ellipsoid.
Both the above assumptions remove one free coordinate from X, effectively reducing the problem's dimensionality.

Non-linear approximation of an image of a geometry

I am trying to approximate different shapes of a weld bead geometry cross section in additive manufacturing with a graph or ideally (but not necessarily) a function. The regions are the outer shape as well as the individual layers. (see following images)
Therefore, I applied some pre-processing methods to extract the relevant pixels which represent the geometry of a weld bead which are shown as white pixels. (see third image)
I derived this image with canny edge detection and multiple morphological operations such as closing erosion and dilation prior to that and of course converting it into grey-scale.
The "noisy" areas are the transition areas between individual layers of metal and only show up in this way, so in general there is not a "better" or "sharper" transition in thus less "noise". Pictures 3 and 4 are an example of some of the image pre-processing methods I used.
My main approach to treat the inner geometry so far was to split up the image in several sub-images and perform least squares regression on each individual one by interpreting the white pixels as data points. Afterwards I've stitched all those little approximation functions back together to form the image of the original size. I've tried it with different sizes of those sub-images. (see pictures 5 and 6)
However, this approach produces jumps between the functions as well as functions next to each other where the pixels or data points in my case should only be approximated with one function (see attached image). My next approach would be to use multivariate adaptive regression on the sub-images.
Thus, I'm asking if anybody knows a better solution for my problem, maybe even for an approximation on global scale without splitting the image into the sub-images. The approximation does not need to be a polynomial function, piece wise linear but connected functions are totally sufficient. I would be thankful if anybody knows a method that is at least capable of achieving what I want to do. Whether a pure non-linear regression method. Unfortunately I don't have many images (only 64), hence I don't think I can use an ANN. (please correct me if I'm wrong)
If you need to take a look at my code, just let me know. Thank you! :)
The best I could obtain is with bilateral filtering for denoising, then adaptive binarization.
And on a reduced image:

Measuring an object in a photograph

I'm trying to work out how to measure an object in a photograph. I want to measure the actual, real-world size of it. Luckily, this object has a scale in cm. My thinking is that I measure the pixels in the scale, and use that to then determine the size of the other object/s in the photo. I've been working on this in scikit images with mixed results. One of the issues I get is that the resolution of the image changes the measurement. So, it seems that thresholding for the scale and extracting pixel counts does not work.
I know that OpenCV has the ability to measure objects with a bounding box, however the objects I'm trying to measure have uneven sides/edges, and this needs to be accounted for (the shape of the object is important too, and I can capture that using thresholding and contours).
I'm hoping that people on this board can point me in alternate/better directions for trying to solve this issue. Perhaps my approach is all wrong. Thank you.
Example photo of vase with 10 cm scale. (https://i.stack.imgur.com/dK2BZ.jpg)

Image Operations with Python

I hope you're all doing well!
I'm new to Image Manipulation, and so I want to apologize right here for my simple question. I'm currently working on a problem that involves classifying an object called jet into two known categories. This object is made of sub-objects. My idea is to use this sub-objects to transform each jet in a pixel image, and then applying convolutional neural networks to find the patterns.
Here is an example of the pixel images:
jet's constituents pixel distribution
To standardize all the images, I want to find the two most intense pixels and make sure the axis connecting them is in the vertical direction, as well as make sure that the most intense pixel is at the top. It also would be good to impose that one of the sides (left or right) of the image contains the majority of the intensity and to normalize the intensity of the whole image to 1.
My question is: as I'm new to this kind of processing, I don't know if there is a library in Python that can handle these operations. Are you aware of any?
PS: the picture was taken from here:https://arxiv.org/abs/1407.5675
You can look into OpenCV library for Python:
https://docs.opencv.org/master/d6/d00/tutorial_py_root.html.
It supports a lot of image processing functions.
In your case, it probably would be easier to convert the image into a more suitable color space in which one axis stands for color intensity (e.g HSI, HSL, HSV) and trying to find indices of the maximum values along this axis (this should return the pixels with the highest intensity in the image).
Generally, in Python, we use PIL library for basic manipulations with images and OpenCV for advances ones.
But, if understand your task correctly, you can just think of an image as a multidimensional array and use numpy to manipulate it.
For example, if your image is stored in a variable of type numpy.array called img, you can find maximum value along the desired axis just by writing:
img.max(axis=0)
To normalize image you can use:
img /= img.max()
To find which image part is brighter, you can split an img array into desired parts and calculate their mean:
left = img[:, :int(img.shape[1]/2), :]
right = img[:, int(img.shape[1]/2):, :]
left_mean = left.mean()
right_mean = right.mean()

OpenCV: How to detect small differences in different images of the same object

I've been trying to detect if a printed image has any defects(shape and color) when compared to either a proof of another printed image which has no defects or the digital version of the image, which also has no defects. I'm using opencv(cv2) and python.
I first take a picture of the printed image. Then, I perform perspective transformation to get the picture of the printed image cropped sufficiently. I am then using Zernike moments, SSIM, and color histograms to compare the color and shape of the image. However, the resulting values vary too much and I am not able to create a threshold for a misprinted image.
I have also tried to subdivide the image into smaller sections and compare those. This is also not creating distinguishable values to determine if there is a misprint or not.
The differences in the print can be subtle or very apparent. Are there any other techniques that I can try? Thanks!
This is an example of a correctly printed image:
This is an example of an incorrectly printed image, it has too much blue ink on the right side:
This is another example of a correct print:
This is an example of a misprint when compared to the one above:

Categories

Resources