So I'm doing template matching on a colored image and skimage.feature.match_template() seems to work just fine. But I'm not sure exactly how it is performing this because although the original images are size N x N x 3, the array output is one dimensional. I theorized that it was only performing the template match on the red layer perhaps, but this doesn't seem to be the case. Does it do some sort of averaging for RGB images? I want to understand where its getting its values from so I know that its interpreting the image correctly. Thanks!
From the match_template() documentation:
Returns
-------
output : array
Response image with correlation coefficients.
So, the template matching uses either 2 or 3 dimensions depending on the input image and template. However, the output will always be a scalar score that tells you how well the template matched at a specific image position.
Related
I'm currently trying to start with an original RGB image, convert it to LUV, perform some operations (namely, rotate the hues), then rotate it back to RGB for display purposes. However, I'm encountering a vexing issue where the RGB-to-LUV conversion (and vice versa) seems to be changing the image. Specifically, if I begin with an LUV image, convert it to RGB, and then change it back to LUV, without changing anything else, the original image is different. This has happened for both the Python (cv2) and Matlab (open source) implementations of the color conversion algorithms, as well as my own hand-coded ones based on. Here is an example:
luv1 = np.array([[[100,6.12,0]]]).astype('float32')
rgb1 = cv2.cvtColor(luv1,cv2.COLOR_Luv2RGB)
luv2 = cv2.cvtColor(rgb1,cv2.COLOR_RGB2Luv)
print(luv2)
[[[99.36293 1.3064307 -1.0494182]]]
As you can see, the LUV coordinates have changed from the input. Is this because certain LUV coordinates have no direct match in RGB space?
Yes, remove the astype('uint8') bit in your code, and the difference should disappear if the conversion is implemented correctly.
You can see the equations for the conversion in Wikipedia. There is nothing there that is irreversible, the conversions are perfect inverses of each other.
However, this conversion contains a 3rd power, which does stretch some values significantly. The rounding of the conversion to an integer can introduce a significant shift of color.
Also, the Luv domain is highly irregular and it might not be easy to verify that Luv values will lead to a valued RGB value. Your statement "I've verified that luv1 has entries that all fall in the allowable input ranges" makes me believe that you think the Luv domain is a box. It is not. The ranges for u and v change with L. One good exercise is to start with a sampling of the RGB cube, and map those to Luv, then plot those points to see the shape of the Luv domain. Wikipedia has an example of what this could look like for the sRGB gamut.
The OpenCV cvtColor function will clamp RGB values to the [0,1] range (if of type float32), leading to irreversible changes of color if the input is out of gamut.
Here is an example that shows that the conversion is reversible. I start with RGB values because these are easy to verify as valid:
import numpy as np
import cv2
rgb1 = np.array([[[1.0,1.0,1.0],[0.5,1.0,0.5],[0.0,0.5,0.5],[0.0,0.0,0.0]]], 'float32')
luv1 = cv2.cvtColor(rgb1, cv2.COLOR_RGB2Luv)
rgb2 = cv2.cvtColor(luv1, cv2.COLOR_Luv2RGB)
np.max(np.abs(rgb2-rgb1))
This returns 2.8897537e-06, which is numerical precision for 32-bit floats.
My question is not too far off from the "Image Alignment (ECC) in OpenCV ( C++ / Python )" article.
I also found the following article about facial alignment to be very interesting, but WAY more complex than my problem.
Wow! I can really go down the rabbit-hole.
My question is WAY more simple.
I have a scanned document that I have treated as a "template". In this template I have manually mapped the pixel regions that I require info from as:
area = (x1,y1,x2,y2)
such that x1<x2, y1<y2.
Now, these regions are, as is likely obvious, a bit too specific to my "template".
All other files that I want to extract data from are mostly shifted by some unknown amount such that their true area for my desired data is:
area = (x1 + ε1, y1 + ε2, x2 + ε1, y2 + ε2)
Where ε1, ε2 are unknown in advance.
But the documents are otherwise HIGHLY similar outside of this shift.
I want to discover, ideally through opencv, what translation is required (for the time being ignoring euclidean) to "align" these images as to disover my ε, shift my area, and parse my data directly.
I have thought about using tesseract to mine the text from the document and then parse from there, but there are check boxes that are either filled or empty
that contain meaningful information for my problem.
The code I currently have for cropping the image is:
from PIL import Image
img = Image.open(img_path)
area = area_lookup['key']
cropped_img = img.crop(area)
cropped_img.show()
My two sample files are attached.
My two images are:
We can assume my first image is my "template".
As you can see, the two images are very "similar" but one is moved slightly (human error). There may be cases where the rotation is more extreme, or the image is shifted more.
I would like transform image 2 to be as aligned to image 1 as possible, and then parse data from it.
Any help would be sincerely appreciated.
Thank you very much
I am trying to detect lines in a certain image. I run it through a skeletonization process before applying the cv2.HoughLinesP. I used the skeletonization code here.
No matter what I try I keep getting results similar to what is described here i.e. 'only fragments of a line..'
As suggested by Jiby, I use the named notation for the parameters and also high rho and theta, but to no avail.
Here is my code:
lines = cv2.HoughLinesP(skel, rho=5, theta=np.deg2rad(10), threshold=0, minLineLength=0, maxLineGap=0)
Prior to this I threshold a RGB image to extract most of my 'blue' hollow rectangle. Then I convert it to gray scale which I then feed to the skeletonizer.
Please advise.
I have to translate a code from Octave to Python, among many things the program does something like this:
load_image = imread('image.bmp')
which as you can see its a bitmap, then if I do
size(load_image) that prints (1200,1600,3) which its ok, but, when I do:
load_image
it prints a one dimensional array, that does not make any sense to me, my question is how in Octave are these values interpreted because I have to load the same image in opencv and I couldn't find the way.
thanks.
What you have is a 3D array in octave. Here in the x-dimension you seem to have RGB values for each pixel and Y and Z dimension are the rows and columns respectively. However when you print it you will see all the values in the array and hence it looks like a 1D array.
Try something like this and look at the output:
load_image(:,:,i)
The i stands for the dimensions of your image RGB. If you want to 2D print your 3D image using matplotlib or similar, you need to do the same.
Image that some new image X arrives, and I want to know if X is new or has already been encountered before. I have code, below, that shrinks the image and then converts it to a hash code. I can then see via a single hash look-up if I've already encountered an image with the same hash code, so it's very fast.
My question is, is there an efficient way for me to see if a similar image, but one with a different hash code, has already been seen? If was going to title this question something like "Data structure for determining efficiently whether a similar, non-identical item is already contained" but decided that would be an instance of the XY problem.
When I say that this new image is "similar," I'm thinking of one that's perhaps gone through lossy compression and so looks like the original to the human eye but is not identical. Normally shrinking the image eliminates the difference, but not always, and if I shrink the image too much I start getting false positives.
Here's my current code:
import PIL
seen_images = {} # This would really be a shelf or something
# From http://www.guguncube.com/1656/python-image-similarity-comparison-using-several-techniques
def image_pixel_hash_code(image):
pixels = list(image.getdata())
avg = sum(pixels) / len(pixels)
bits = "".join(map(lambda pixel: '1' if pixel < avg else '0', pixels)) # '00010100...'
hexadecimal = int(bits, 2).__format__('016x').upper()
return hexadecimal
def process_image(filepath):
thumb = PIL.Image.open(filepath).resize((128,128)).convert("L")
code = image_pixel_hash_code(thumb)
previous_image = seen_images.get(code, None)
if code in seen_images:
print "'{}' already seen as '{}'".format(filepath, previous_image)
else:
seen_images[code] = filepath
You can put a path to a bunch of image files into a variable called IMAGE_ROOT and then try my code out with:
import os
for root, dirs, files in os.walk(IMAGE_ROOT):
for filename in files:
filepath = os.path.join(root, filename)
try:
process_image(filepath)
except IOError:
pass
There are a lot of methods for comparing images, but for your given example I suspect that simplicity and speed are the key factors (hence why you're trying to use a hash as a first-pass). Here are some suggestions - in all cases I'd suggest shrinking and cropping the image to a regular size and shape.
Smooth the image (gaussian blur) before shrinking to minimise the influence of artefacts. Then apply the hash or other comparison.
Subtract the images from one another (RGB) and check the remainder. Identical images will return zero, compression artefacts will result in small minor variations. You can either threshold, sum, or average the value and compare to a cut-off.
Use standard distance algorithsm (see scipy.spatial.distance) to calculate 'distance' between the two images. For example euclidean distance will give effectively the same as the sum of subtracting, while cosine will ignore itensity but match the profile of changes over the image i.e. a darker version of the same image will be considered equivalent. For these you will need to flatten your image to a 1D array.
The last two entail comparing every image to every other image when uploading, and that is going to get very computationally expensive for large numbers of images.