Quickly determining using Python whether an image is (fuzzily) in a collection - python

Image that some new image X arrives, and I want to know if X is new or has already been encountered before. I have code, below, that shrinks the image and then converts it to a hash code. I can then see via a single hash look-up if I've already encountered an image with the same hash code, so it's very fast.
My question is, is there an efficient way for me to see if a similar image, but one with a different hash code, has already been seen? If was going to title this question something like "Data structure for determining efficiently whether a similar, non-identical item is already contained" but decided that would be an instance of the XY problem.
When I say that this new image is "similar," I'm thinking of one that's perhaps gone through lossy compression and so looks like the original to the human eye but is not identical. Normally shrinking the image eliminates the difference, but not always, and if I shrink the image too much I start getting false positives.
Here's my current code:
import PIL
seen_images = {} # This would really be a shelf or something
# From http://www.guguncube.com/1656/python-image-similarity-comparison-using-several-techniques
def image_pixel_hash_code(image):
pixels = list(image.getdata())
avg = sum(pixels) / len(pixels)
bits = "".join(map(lambda pixel: '1' if pixel < avg else '0', pixels)) # '00010100...'
hexadecimal = int(bits, 2).__format__('016x').upper()
return hexadecimal
def process_image(filepath):
thumb = PIL.Image.open(filepath).resize((128,128)).convert("L")
code = image_pixel_hash_code(thumb)
previous_image = seen_images.get(code, None)
if code in seen_images:
print "'{}' already seen as '{}'".format(filepath, previous_image)
else:
seen_images[code] = filepath
You can put a path to a bunch of image files into a variable called IMAGE_ROOT and then try my code out with:
import os
for root, dirs, files in os.walk(IMAGE_ROOT):
for filename in files:
filepath = os.path.join(root, filename)
try:
process_image(filepath)
except IOError:
pass

There are a lot of methods for comparing images, but for your given example I suspect that simplicity and speed are the key factors (hence why you're trying to use a hash as a first-pass). Here are some suggestions - in all cases I'd suggest shrinking and cropping the image to a regular size and shape.
Smooth the image (gaussian blur) before shrinking to minimise the influence of artefacts. Then apply the hash or other comparison.
Subtract the images from one another (RGB) and check the remainder. Identical images will return zero, compression artefacts will result in small minor variations. You can either threshold, sum, or average the value and compare to a cut-off.
Use standard distance algorithsm (see scipy.spatial.distance) to calculate 'distance' between the two images. For example euclidean distance will give effectively the same as the sum of subtracting, while cosine will ignore itensity but match the profile of changes over the image i.e. a darker version of the same image will be considered equivalent. For these you will need to flatten your image to a 1D array.
The last two entail comparing every image to every other image when uploading, and that is going to get very computationally expensive for large numbers of images.

Related

Problem with imgIdx in DMatch class using FlannBasedMatcher in Python

I have the same issue as here:
how to access best image corresponding to best keypoint match using opencv flannbasedmatcher and dmatch
Unfortunately, this post doesn't have an answer.
I have several images (and corresponding descriptors), that I add to the FlannBasedMatcher, using the 'add' method (once for each set of descriptors, corresponding to a single image).
However, when I match an image, the return imgIdx is way larger than the number of images in the training set. I feel like each descriptor is treated as an image, but this is not what I want.
I want to know which image (or set of descriptors) each feature has been matched to.
Here is a part of my code (I simplified it a bit, and I know 'test' is not great for a variable name, but it's temporary).
Also here I read .key files, which are basically files containing keypoints and descriptors of an image (extracted with SIFT).
I just precise that in the following code, featMatch is just a class I created to create a FlannBasedMatcher (with initialization parameters).
with open(os.path.join(ROOT_DIR,"images\\descriptor_list.txt"),'r') as f:
for line in f:
folder_path = os.path.join(ROOT_DIR,"images\\",line[:-1]+"\\","*.key")
list_key = glob.glob(folder_path)
test2 = []
for key in list_key:
if os.path.isfile(key):
feat = Features()
feat.readFromFile(key)
test = feat.descriptors
test2 = test2+test
featMatch.add(test2)
# Read submitted picture features
feat = Features()
feat.readFromFile(os.path.join(ROOT_DIR,"submitted_picture\\sub.key"))
matches = []
matches.append(featMatch.knnMatch(np.array(feat.descriptors), k=3))
print(matches)
I was expecting, when looking at the matches, and more specifically at the imgIdx of the matches, to be told which image index the matching feature (trainIdx) correspond to, based on the number of descriptor sets I added with 'add' method.
But following this assumption, I should be able to have imgIdx larger than the number of images (or training sets) in my training set.
However, here, I get numbers such as 2960, while I only have about 5 images in my training set.
My guess is that it returns the feature index instead of the image index, but I don't know why.
I noticed that the 'add' method in C++ takes an array of array, where we have a list of descriptor sets (one for each image I guess). But here I have a different number of features for each image, so I can't really create a numpy array with a different number of rows in each column.
Thanks.
I finally figure it out after looking at the C++ source code of matcher.cpp:
https://github.com/opencv/opencv/blob/master/modules/features2d/src/matchers.cpp
I'm gonna post the answer, in case somebody needs it someday.
I thought that the 'add' method would increment the image count when called, but it does not. So, I realized that I have to create a list of Mat (or numpy array in python) and give it once to 'add', instead of calling it for each image.
So here is the updated (and working) source code:
with open(os.path.join(ROOT_DIR,"images\\descriptor_list.txt"),'r') as f:
list_image_descriptors = []
for line in f:
folder_path = os.path.join(ROOT_DIR,"images\\",line[:-1]+"\\","*.key")
list_key = glob.glob(folder_path)
for key in list_key:
if os.path.isfile(key):
feat = Features()
feat.readFromFile(key)
img_descriptors = np.array(feat.descriptors)
list_image_descriptors.append(img_descriptors)
featMatch.add(list_image_descriptors)
# Read submitted picture features
feat = Features()
feat.readFromFile(os.path.join(ROOT_DIR,"submitted_picture\\sub.key"))
matches = []
matches.append(featMatch.knnMatch(np.array(feat.descriptors), k=3))
print(matches)
Hope this helps.

Discover transformation required to align images of standardized documents

My question is not too far off from the "Image Alignment (ECC) in OpenCV ( C++ / Python )" article.
I also found the following article about facial alignment to be very interesting, but WAY more complex than my problem.
Wow! I can really go down the rabbit-hole.
My question is WAY more simple.
I have a scanned document that I have treated as a "template". In this template I have manually mapped the pixel regions that I require info from as:
area = (x1,y1,x2,y2)
such that x1<x2, y1<y2.
Now, these regions are, as is likely obvious, a bit too specific to my "template".
All other files that I want to extract data from are mostly shifted by some unknown amount such that their true area for my desired data is:
area = (x1 + ε1, y1 + ε2, x2 + ε1, y2 + ε2)
Where ε1, ε2 are unknown in advance.
But the documents are otherwise HIGHLY similar outside of this shift.
I want to discover, ideally through opencv, what translation is required (for the time being ignoring euclidean) to "align" these images as to disover my ε, shift my area, and parse my data directly.
I have thought about using tesseract to mine the text from the document and then parse from there, but there are check boxes that are either filled or empty
that contain meaningful information for my problem.
The code I currently have for cropping the image is:
from PIL import Image
img = Image.open(img_path)
area = area_lookup['key']
cropped_img = img.crop(area)
cropped_img.show()
My two sample files are attached.
My two images are:
We can assume my first image is my "template".
As you can see, the two images are very "similar" but one is moved slightly (human error). There may be cases where the rotation is more extreme, or the image is shifted more.
I would like transform image 2 to be as aligned to image 1 as possible, and then parse data from it.
Any help would be sincerely appreciated.
Thank you very much

read an array of pixel values python

I would like to take a screenshot with a certain range of the screen, and then I would like to check the pixel values of certain lines (eg x_axis from 400 to 800).
I tried multiple ways like the imagegrab, gdi32.GetPixel and some more. It seems reading pixels values take a lot of time, so I even tried converting it into a list, something like this
im = ImageGrab.grab(box)
pixels = list(im .getdata())
Even this does not seem fast. Is there something I'm doing wrong?
ImageGrab returns pixels in PIL format (the Python Imaging Library: http://effbot.org/imagingbook/image.htm), and .getdata() already returns the pixels as a sequence. By wrapping it in list() again you are doing the same (expensive) operation twice. You can just do:
im = ImageGrab.grab(box)
pixels = im.getdata()
And iterate through your pixels in your favorite way.

Pixel to pixel edit using PIL and Image.point

I can't seem to understand what Image point does. I want to do some pixel edit which might include checking which color value(r, g or b) is max in every pixel and act accordingly. Lets say that I can't use numpy. I managed to use Image point to add the same value to every pixel in an image.
point code
import Image, math
def brightness(i, value):
value = math.floor(255*(float(value)/100))
return i+value
if __name__ == '__main__':
image = '/home/avlahop/verybig.jpg'
print image
img = Image.open(image)
print img
out = img.point(lambda i: brightness(i, 50))
out.show()
numpy code
def brightness(arr, adjust):
import math
adjust = math.floor(255*(float(adjust)/100))
arr[...,0] += adjust
arr[...,1] += adjust
arr[...,2] += adjust
return arr
if __name__ == '__main__':
image = '/home/avlahop/verybig.jpg'
img = Image.open(image).convert('RGBA')
arr = np.array(np.asarray(img).astype('float'))
new_image = Image.fromarray(brightness(arr, adjust).clip(0,255).astype('uint8'), 'RGBA').show()
I have to say that point code is faster than numpy's. But what if i want to do a more complex operation with point. for example for every pixel check the max(r,g,b) and do something depending on if r=max or g=max or b=max. As you saw i used the point with function as argument. It takes one argument i. what is this i? is it the pixel?(i.e i=(r,g,b)?).I can't seem to understand from the pil documentation
The docs may not have been clear in earlier versions of PIL, but in Pillow it's spelled out pretty well. From Image.point:
lut – A lookup table, containing 256 values per band in the image. A function can be used instead, it should take a single argument. The function is called once for each possible pixel value, and the resulting table is applied to all bands of the image.
In other words, it's not a general-purpose way to map each pixel through a function, it's just a way to dynamically built the lookup table, instead of passing in a pre-built one.
In other words, it's called with the numbers from 0 through 255. (Which you can find out for yourself pretty easily by just writing a function that appends its argument to a global list and then dump out the list at the end…)
If you split your image into separate bands or planes, point each one of them with a different function, and then recombine them, that might be able to accomplish what you're trying to do. But even then, I think eval is what you wanted, not point.
But I think what you really want, which is a pixel-by-pixel all-bands-at-once iterator. And you don't need anything special for that. Just use map or a comprehension over getdata. Isn't that slow? Of course it's slow, because it's calling your function X*Y times; the cost of building the getdata sequence and iterating over it is tiny compared to that cost, so looking for a way for PIL to optimize the already-fast-enough part won't get you very far.

Categorize different images

I have a number of images from Chinese genealogies, and I would like to be able to programatically categorize them. Generally speaking, one type of image has primarily line-by-line text, while the other type may be in a grid or chart format.
Example photos
'Desired' type: http://www.flickr.com/photos/63588871#N05/8138563082/
'Other' type: http://www.flickr.com/photos/63588871#N05/8138561342/in/photostream/
Question: Is there a (relatively) simple way to do this? I have experience with Python, but little knowledge of image processing. Direction to other resources is appreciated as well.
Thanks!
Assuming that at least some of the grid lines are exactly or almost exactly vertical, a fairly simple approach might work.
I used PIL to find all the columns in the image where more than half of the pixels were darker than some threshold value.
Code
import Image, ImageDraw # PIL modules
withlines = Image.open('withgrid.jpg')
nolines = Image.open('nogrid.jpg')
def findlines(image):
w,h, = image.size
s = w*h
im = image.point(lambda i: 255 * (i < 60)) # threshold
d = im.getdata() # faster than per-pixel operations
linecolumns = []
for col in range(w):
black = sum( (d[x] for x in range(col, s, w)) )//255
if black > 450:
linecolumns += [col]
# return an image showing the detected lines
im2 = image.convert('RGB')
draw = ImageDraw.Draw(im2)
for col in linecolumns:
draw.line( (col,0,col,h-1), fill='#f00', width = 1)
return im2
findlines(withlines).show()
findlines(nolines).show()
Results
showing detected vertical lines in red for illustration
As you can see, four of the grid lines are detected, and, with some processing to ignore the left and right sides and the center of the book, there should be no false positives on the desired type.
This means that you could use the above code to detect black columns, discard those that are near to the edge or the center. If any black columns remain, classify it as the "other" undesired class of pictures.
AFAIK, there is no easy way to solve this. You will need a decent amount of image processing and some basic machine learning to classify these kinds of images (and even than it probably won't be 100% successful)
Another note:
While this can be solved by only using machine learning techniques, I would advice you to start searching for some image processing techniques first and try to convert your image to a form that has a decent difference for both images. For this you best start reading about the fft. After that have a look at some digital image processing techniques. When you feel comfortable that you have a decent understanding of these, you can read up on pattern recognition.
This is only one suggested approach though, there are more ways to achieve this.

Categories

Resources