Related
I have an image created by many polygons of different solid colors. The coordinates themselves are not given, but can be detected if necessary.
I'm looking for a way to detect all points which are the intersection of 3 or more different colors. The colors are not known in advance, might be similar to each other (e.g one might be (255, 255, 250) and another is (255, 255, 245). The specific shade doesn't matter, just the fact that it is different).
for example, in the following image a tiny star marks all the points that I'm looking for.
As your annotations have obscured the intersections you are trying to identify, I made a new, similar image.
Rather than trying to bend my brain around trying to deal with 3-dimensions of 8-bit RGB colour, I converted that to a single 24-bit integer and then ran a generic filter from SciPy and counted the number of unique colours in each 3x3 window and made a new image from that. So each pixel in the result has a brightness value equal to the number of colours in its neighbourhood. I counted the number of colours by converting the Numpy array of neighbours into a Python set - exploiting the fact that a set can only have unique numbers in it.
#!/usr/bin/env python3
import numpy as np
from PIL import Image
from scipy.ndimage import generic_filter
# CountUnique
def CountUnique(P):
"""
We receive P[0]..P[8] with the pixels in the 3x3 surrounding window, return count of unique values
"""
return len(set(P))
# Open image and make into Numpy array
PILim = Image.open('patches.png').convert('RGB')
RGBim = np.array(PILim)
# Make a single channel 24-bit image rather than 3 channels of 8-bit each
RGB24 = (RGBim[...,0].astype(np.uint32)<<16) | (RGBim[...,1].astype(np.uint32)<<8) | RGBim[...,2].astype(np.uint32)
# Run generic filter counting unique colours in neighbourhood
result = generic_filter(RGB24, CountUnique, (3, 3))
# Save result
Image.fromarray(result.astype(np.uint8)).save('result.png')
The resultant image is shown here, with the contrast stretched so that you can see the brightest pixels at the intersections you seek.
A histogram of the values in the result image shows there are 21 pixels which have 3 unique colours in their 3x3 neighbourhood and 4,348 pixels which have 2 unique colours in their neighbourhood. You can find these by running np.where(result==3), for example.
Histogram:
155631: ( 1, 1, 1) #010101 gray(1)
4348: ( 2, 2, 2) #020202 gray(2)
21: ( 3, 3, 3) #030303 gray(3)
For extra fun, I had a go at programming the method suggested by #Micka and that gives the same results, code looks like this:
#!/usr/bin/env python3
import numpy as np
from PIL import Image
from skimage.morphology import dilation, disk
# Open image and make into Numpy array
PILim = Image.open('patches.png').convert('RGB')
RGBim = np.array(PILim)
h, w = RGBim.shape[0], RGBim.shape[1]
# Make a single channel 24-bit image rather than 3 channels of 8-bit each
RGB24 = (RGBim[...,0].astype(np.uint32)<<16) | (RGBim[...,1].astype(np.uint32)<<8) | RGBim[...,2].astype(np.uint32)
# Make list of unique colours
UniqueColours = np.unique(RGB24)
# Create result image
result = np.zeros((h,w),dtype=np.uint8)
# Make mask for any particular colour - same size as original image
mask = np.zeros((h,w), dtype=np.uint8)
# Make disk-shaped structuring element for morphology
selem = disk(1)
# Iterate over unique colours
for i,u in enumerate(UniqueColours):
# Turn on all pixels matching this unique colour, turn off all others
mask = np.where(RGB24==u,1,0)
# Dilate (fatten) the mask by 1 pixel
mask = dilation(mask,selem)
# Add all activated pixels to result image
result = result + mask
# Save result
Image.fromarray(result.astype(np.uint8)).save('result.png')
For reference, I created the image with anti-aliasing disabled in ImageMagick at the command line like this:
convert -size 400x400 xc:red -background red +antialias \
-fill blue -draw "polygon 42,168 350,72 416,133 416,247 281,336" \
-fill yellow -draw "polygon 271,11 396,127 346,154 77,86" \
-fill lime -draw "polygon 366,260 366,400 120,400" patches.png
Keywords: Python, image, image processing, intersect, intersection, PIL/Pillow, adjacency, neighbourhood, neighborhood, neighbour, neighbor, generic, SciPy, 3x3, filter.
I want to get the image Difference for the print which is captured using camera.
I tried many solution using python libraries: opencv, image-magic, etc.
The solution I found for image comparison is for better accuracy is:
move image : left to right and look for minimum difference.
move image : right to left and look for minimum difference.
move image : top to bottom and look for minimum difference.
move image : bottom to top and look for minimum difference.
Condition to capture Image :
1. camera will never move (mounted over a fix stand).
2. Object is placed manually over a white sheet, thus the object will never be properly aligned. (slight variation in angle every time, as it is manual )
Image Sample captured using camera for the bellow code :
Image sample 1: white Dots :
Image sample 2: as original image
Image sample 3: black dots
Accepted Output for print with white dots is not available, but it should only mark the difference(defect) :
Currently I am using following Image-magic command for image difference:
compare -highlight-color black -fuzz 5% -metric AE Image_1.png Image_2.png -compose src diff.png
Code :
import subprocess
# -fuzz 5% # ignore minor difference between two images
cmd = 'compare -highlight-color black -fuzz 5% -metric AE Input.png output.png -compose src diff.png '
subprocess.call(cmd, shell=True)
Output after difference is incorrect as the comparison works pixel to pixel, it is not smart enough to mark only the real difference:
The above solution which I mention will work to get required difference as output, but there is no library or image-magic command available for such image comparison.
Any python code OR Image-magic command for doing this?
It seems you are doing some defect detection task. The first solution comes in my mind is the image registration technique.
First try to take the images in the same conditions (lighting, camera angle and ...) (one of your provided images is bigger 2 pixels).
Then you should register two images and match one to the other one, like this
Then wrap them with the help of homography matrix, and generate an aligned image, in this case, the result is like this:
Then take the difference of aligned image with the query image and threshold it, the result:
As I said if you try to take your frames with more precise, the registration result will be better and cause more accurate performance.
The codes for each part: (mostly taken from here).
import cv2
import numpy as np
MAX_FEATURES = 1000
GOOD_MATCH_PERCENT = 0.5
def alignImages(im1, im2):
# Convert images to grayscale
im1Gray = cv2.cvtColor(im1, cv2.COLOR_BGR2GRAY)
im2Gray = cv2.cvtColor(im2, cv2.COLOR_BGR2GRAY)
# Detect ORB features and compute descriptors.
orb = cv2.ORB_create(MAX_FEATURES)
keypoints1, descriptors1 = orb.detectAndCompute(im1Gray, None)
keypoints2, descriptors2 = orb.detectAndCompute(im2Gray, None)
# Match features.
matcher = cv2.DescriptorMatcher_create(cv2.DESCRIPTOR_MATCHER_BRUTEFORCE_HAMMING)
matches = matcher.match(descriptors1, descriptors2, None)
# Sort matches by score
matches.sort(key=lambda x: x.distance, reverse=False)
# Remove not so good matches
numGoodMatches = int(len(matches) * GOOD_MATCH_PERCENT)
matches = matches[:numGoodMatches]
# Draw top matches
imMatches = cv2.drawMatches(im1, keypoints1, im2, keypoints2, matches, None)
cv2.imwrite("matches.jpg", imMatches)
# Extract location of good matches
points1 = np.zeros((len(matches), 2), dtype=np.float32)
points2 = np.zeros((len(matches), 2), dtype=np.float32)
for i, match in enumerate(matches):
points1[i, :] = keypoints1[match.queryIdx].pt
points2[i, :] = keypoints2[match.trainIdx].pt
# Find homography
h, mask = cv2.findHomography(points1, points2, cv2.RANSAC)
# Use homography
height, width, channels = im2.shape
im1Reg = cv2.warpPerspective(im1, h, (width, height))
return im1Reg
if __name__ == '__main__':
# Read reference image
refFilename = "vv9gFl.jpg"
imFilename = "uP3CYl.jpg"
imReference = cv2.imread(refFilename, cv2.IMREAD_COLOR)
im = cv2.imread(imFilename, cv2.IMREAD_COLOR)
# Registered image will be resotred in imReg.
# The estimated homography will be stored in h.
imReg = alignImages(im, imReference)
# Write aligned image to disk.
outFilename = "aligned.jpg"
cv2.imwrite(outFilename, imReg)
for image difference and thresholding:
alined = cv2.imread("aligned.jpg" , 0)
alined = alined[:, :280]
b = cv2.imread("vv9gFl.jpg", 0 )
b = b[:, :280]
print (alined.shape)
print (b.shape)
diff = cv2.absdiff(alined, b)
cv2.imwrite("diff.png", diff)
threshold = 25
alined[np.where(diff > threshold)] = 255
alined[np.where(diff <= threshold)] = 0
cv2.imwrite("threshold.png", diff)
If you have lots of images and want to do defect detecting task I suggest using Denoising Autoencoder to train a deep artificial neural network. Read more here.
Although you do not want point-by-point processing, here is a subimage-search compare using Imagemagick. It pads one image after cropping off the black and then shifts the smaller to find the best match locations with the larger.
crop image1:
convert image1.jpg -gravity north -chop 0x25 image1c.png
crop and pad image2:
convert image2.jpg -gravity north -chop 0x25 -gravity center -bordercolor "rgb(114,151,157)" -border 20x20 image2c.png
do subimage search
compare -metric rmse -subimage-search image2c.png image1c.png null:
1243.41 (0.0189732) # 22,20
now shift and get difference between the two images
convert image2c.png image1c.png -geometry +22+20 -compose difference -composite -shave 22x20 -colorspace gray -auto-level +level-colors white,red diff.png
ADDITION:
If you want to just use compare, then you need to add -fuzz 15% to the compare command:
compare -metric rmse -fuzz 15% -subimage-search image2c.png image1c.png diff.png
Two images are produced. The difference image is the first, so look at diff-0.png
I am analyzing medical images. All images have a marker with the position. It looks like this
It is the "TRH RMLO" annotation in this image, but it can be different in other images. Also the size varies. The image is cropped but you see that the tissue is starting on the right side.
I found that the presence of these markers distort my analysis.
How can I remove them?
I load the image in python like this
import dicom
import numpy as np
img = dicom.read_file(my_image.dcm)
img_array = img.pixel_array
The image is then a numpy array. The white text is always surrounded by a large black area (black has value zero). The marker is in a different position in each image.
How can I remove the white text without hurting the tissue data.
UPDATE
added a second image
UPDATE2:
Here are two of the original dicom files. All personal information has been removed.edit:removed
Looking at the actual pixel values of the image you supplied, you can see that the marker is almost (99.99%) pure white and this doesn't occur elsewhere in the image so you can isolate it with a simple 99.99% threshold.
I prefer ImageMagick at the command-line, so I would do this:
convert sample.dcm -threshold 99.99% -negate mask.png
convert sample.dcm mask.png -compose darken -composite result.jpg
Of course, if the sample image is not representative, you may have to work harder. Let's look at that...
If the simple threshold doesn't work for your images, I would look at "Hit and Miss Morphology". Basically, you threshold your image to pure black and white - at around 90% say, and then you look for specific shapes, such as the corner markers on the label. So, if we want to look for the top-left corner of a white rectangle on a black background, and we use 0 to mean "this pixel must be black", 1 to mean "this pixel must be white" and - to mean "we don't care", we would use this pattern:
0 0 0 0 0
0 1 1 1 1
0 1 - - -
0 1 - - -
0 1 - - -
Hopefully you can see the top left corner of a white rectangle there. That would be like this in the Terminal:
convert sample.dcm -threshold 90% \
-morphology HMT '5x5:0,0,0,0,0 0,1,1,1,1 0,1,-,-,- 0,1,-,-,- 0,1,-,-,-' result.png
Now we also want to look for top-right, bottom-left and bottom-right corners, so we need to rotate the pattern, which ImageMagick handily does when you add the > flag:
convert sample.dcm -threshold 90% \
-morphology HMT '5x5>:0,0,0,0,0 0,1,1,1,1 0,1,-,-,- 0,1,-,-,- 0,1,-,-,-' result.png
Hopefully you can see dots demarcating the corners of the logo now, so we could ask ImageMagick to trim the image of all extraneous black and just leave the white dots and then tell us the bounding box:
cconvert sample.dcm -threshold 90% \
-morphology HMT '5x5>:0,0,0,0,0 0,1,1,1,1 0,1,-,-,- 0,1,-,-,- 0,1,-,-,-' -format %# info:
308x198+1822+427
So, if I now draw a red box around those coordinates, you can see where the label has been detected - of course in practice I would draw a black box to cover it but I am explaining the idea:
convert sample.dcm -fill "rgba(255,0,0,0.5)" -draw "rectangle 1822,427 2130,625" result.png
If you want a script to do that automagically, I would use something like this, saving it as HideMarker:
#!/bin/bash
input="$1"
output="$2"
# Find corners of overlaid marker using Hit and Miss Morphology, then get crop box
IFS="x+" read w h x1 y1 < <(convert "$input" -threshold 90% -morphology HMT '5x5>:0,0,0,0,0 0,1,1,1,1 0,1,-,-,- 0,1,-,-,- 0,1,-,-,-' -format %# info:)
# Calculate bottom-right corner from top-left and dimensions
((x1=x1-1))
((y1=y1-1))
((x2=x1+w+1))
((y2=y1+h+1))
convert "$input" -fill black -draw "rectangle $x1,$y1 $x2,$y2" "$output"
Then you would do this to make it executable:
chmod +x HideMarker
And run it like this:
./HideMarker someImage.dcm result.png
I have another idea. This solution is in OpenCV using python. It is a rather solution.
First, obtain the binary threshold of the image.
ret,th = cv2.threshold(img,2,255, 0)
Perform morphological dilation:
dilate = cv2.morphologyEx(th, cv2.MORPH_DILATE, kernel, 3)
To join the gaps, I then used median filtering:
median = cv2.medianBlur(dilate, 9)
Now you can use the contour properties to eliminate the smallest contour and retain the other containing the image.
It also works for the second image:
If these annotations are in the DICOM file there are a couple ways they could be stored (see https://stackoverflow.com/a/4857782/1901261). The currently supported method can be cleaned off by simply removing the 60xx group attributes from the files.
For the deprecated method (which is still commonly used) you can clear out the unused high bit annotations manually without messing up the other image data as well. Something like:
int position = object.getInt( Tag.OverlayBitPosition, 0 );
if( position == 0 ) return;
int bit = 1 << position;
int[] pixels = object.getInts( Tag.PixelData );
int count = 0;
for( int pix : pixels )
{
int overlay = pix & bit;
pixels[ count++ ] = pix - overlay;
}
object.putInts( Tag.PixelData, VR.OW, pixels );
If these are truly burned into the image data, you're probably stuck using one of the other recommendations here.
The good thing is, that these watermarks are probably in an isolated totally black are which makes it easier (although it's questionable if removing this is according to the indicated usage; license-stuff).
Without beeing an expert, here is one idea. It might be a sketch of some very very powerful approach tailored to this problem but you have to decide if implementation-complexity & algorithmic-complexity (very dependent on image-statistics) are worth it:
Basic idea
Detect the semi-cross like borders (4)
Calculate the defined rectangle from these
Black-out this rectangle
Steps
0
Binarize
1
Use some gradient-based edge-detector to get all the horizontal edges
There may be multiple; you can try to give min-length (maybe some morphology needed to connect pixels which are not connected based on noise in source or algorithm)
2
Use some gradient-based edge-detector to get all the horizontal edges
Like the above, but a different orientation
3
Do some connected-component calculation to get some objects which are vertical and horizontal lines
Now you can try different chosings of candidate-components (8 real ones) with the following knowledge
two of these components can be described by the same line (slope-intercept form; linear regression problem) -> line which borders the rectangle
it's probably that the best 4 pair-chosings (according to linear-regression loss) are the valid borders of this rectangle
you might add the assumption, that vertical borders and horizontal borders are orthogonal to each other
4
- Calculate the rectangle from these borders
- Widen it by a few pixels (hyper-parameter)
- Black-out that rectangle
That's the basic approach.
Alternative
This one is much less work, use more specialized tools and assumes the facts in the opening:
the stuff to remove is on some completely black part of the image
it's kind of isolated; distance to medical-data is high
Steps
Run some general OCR to detect characters
Get the occupied pixels / borders somehow (i'm not sure what OCR tools return)
Calculate some outer rectangle and black-out (using some predefined widening-gap; this one needs to be much bigger than the one above)
Alternative 2
Sketch only: The idea is to use something like binary-closing on the image somehow to build fully connected-components ouf of the source pixels (while small gaps/holes are filled), so that we got one big component describing the medical-data and one for the watermark. Then just remove the smaller one.
I am sure this can be optimized, but ... You could create 4 patches of size 3x3 or 4x4, and initialize them with the exact content of the pixel values for each of the individual corners of the frame surrounding the annotation text. You could then iterate over the whole image (or have some smart initialization looking only in the black area) and find the exact match for those patches. It is not very likely you will have the same regular structure (90 deg corner surrounded by near 0) in the tissue, so this might give you the bounding box.
Simpler one is still possible!!!.
Just implement following after (img_array = img.pixel_array)
img_array[img_array > X] = Y
In which X is the intensity threshold you want to eliminate after that. Also Y is the intensity value which you want to consider instead of that.
For example:
img_array[img_array > 4000] = 0
Replace white matter greater than 4000 with black intensity 0.
I have an image of a sticky note on a background (say a wall, or a laptop) and I want to detect the edges of the sticky note (rough detection also works fine) so that i can run a crop on it.
I plan on using ImageMagick for the actual cropping, but am stuck on detecting the edges.
Ideally, my output should give me 4 coordinates for the 4 border points so I can run my crop on it.
How should I proceed with this?
You can do that with ImageMagick.
There are different IM methods one can come up with. Here is the first algorithm which came to mind for me. It assumes the "sticky notes" are not tilted or rotated on the larger image:
First stage: use canny edge detection to reveal the edges of the sticky note.
Second stage: determine the coordinates of the edges.
Canny Edge Detection
This command will create a black+white image depicting all edges in the original image:
convert \
http://i.stack.imgur.com/SxrwG.png \
-canny 0x1+10%+30% \
canny-edges.png
Determine Coordinates of Edges
Assuming the image is sized XxY pixels. Then you can resize an image into a 1xY column and a Xx1 row of pixels, where each pixel's color value is the average of the respective pixels of all pixels which were in the same row or same column as the respective column/row pixel.
As an example which can be seen below, I'll first resize the new canny-edges.png to 4xY and Xx4 images:
identify -format " %W x %H\n" canny-edges.png
400x300
convert canny-edges.png -resize 400x4\! canny-4cols.png
convert canny-edges.png -resize 4x300\! canny-4rows.png
canny-4cols.png
canny-4rows.png
Now that the previous images visualized what the compression-resizing of an image into a few columns or rows of pixels will achieve, let's do it with a single column and a single row. At the same time we'll change the output format to text, not PNG, in order to get the coordinates of these pixels which are white:
convert canny-edges.png -resize 400x1\! canny-1col.txt
convert canny-edges.png -resize 1x300\! canny-1row.txt
Here is part of the output from canny-1col.txt:
# ImageMagick pixel enumeration: 400,1,255,gray
0,0: (0,0,0) #000000 gray(0)
1,0: (0,0,0) #000000 gray(0)
2,0: (0,0,0) #000000 gray(0)
[....]
73,0: (0,0,0) #000000 gray(0)
74,0: (0,0,0) #000000 gray(0)
75,0: (10,10,10) #0A0A0A gray(10)
76,0: (159,159,159) #9F9F9F gray(159)
77,0: (21,21,21) #151515 gray(21)
78,0: (156,156,156) #9C9C9C gray(156)
79,0: (14,14,14) #0E0E0E gray(14)
80,0: (3,3,3) #030303 gray(3)
81,0: (3,3,3) #030303 gray(3)
[....]
162,0: (3,3,3) #030303 gray(3)
163,0: (4,4,4) #040404 gray(4)
164,0: (10,10,10) #0A0A0A gray(10)
165,0: (7,7,7) #070707 gray(7)
166,0: (8,8,8) #080808 gray(8)
167,0: (8,8,8) #080808 gray(8)
168,0: (8,8,8) #080808 gray(8)
169,0: (9,9,9) #090909 gray(9)
170,0: (7,7,7) #070707 gray(7)
171,0: (10,10,10) #0A0A0A gray(10)
172,0: (5,5,5) #050505 gray(5)
173,0: (13,13,13) #0D0D0D gray(13)
174,0: (6,6,6) #060606 gray(6)
175,0: (10,10,10) #0A0A0A gray(10)
176,0: (10,10,10) #0A0A0A gray(10)
177,0: (7,7,7) #070707 gray(7)
178,0: (8,8,8) #080808 gray(8)
[....]
319,0: (3,3,3) #030303 gray(3)
320,0: (3,3,3) #030303 gray(3)
321,0: (14,14,14) #0E0E0E gray(14)
322,0: (156,156,156) #9C9C9C gray(156)
323,0: (21,21,21) #151515 gray(21)
324,0: (159,159,159) #9F9F9F gray(159)
325,0: (10,10,10) #0A0A0A gray(10)
326,0: (0,0,0) #000000 gray(0)
327,0: (0,0,0) #000000 gray(0)
[....]
397,0: (0,0,0) #000000 gray(0)
398,0: (0,0,0) #000000 gray(0)
399,0: (0,0,0) #000000 gray(0)
As you can see, the detected edges from the text also influenced the grayscale values of the pixels. So we could introduce an additional -threshold 50% operation into our commands, to get pure black+white output:
convert canny-edges.png -resize 400x1\! -threshold 50% canny-1col.txt
convert canny-edges.png -resize 1x300\! -threshold 50% canny-1row.txt
I'll not quote the contents of the new text files here, you can try it and look for yourself if you are interested. Instead, I'll do a shortcut: I'll output the textual representation of the pixel color values to <stdout> and directly grep it for all non-black pixels:
convert canny-edges.png -resize 400x1\! -threshold 50% txt:- \
| grep -v black
# ImageMagick pixel enumeration: 400,1,255,srgb
76,0: (255,255,255) #FFFFFF white
78,0: (255,255,255) #FFFFFF white
322,0: (255,255,255) #FFFFFF white
324,0: (255,255,255) #FFFFFF white
convert canny-edges.png -resize 1x300\! -threshold 50% txt:- \
| grep -v black
# ImageMagick pixel enumeration: 1,300,255,srgb
0,39: (255,255,255) #FFFFFF white
0,41: (255,255,255) #FFFFFF white
0,229: (255,255,255) #FFFFFF white
0,231: (255,255,255) #FFFFFF white
From above results you can conclude that the four pixel coordinates of the
stick note inside the other image are:
lower left corner: (323|40)
upper right corner: (77|230)
The width of the area is 246 pixels and the height is 190 pixels.
(ImageMagick assumes the origin of its coordinate system the upper left corner of an image.)
To now cut the sticky note from the original image you can do:
convert http://i.stack.imgur.com/SxrwG.png[246x190+77+40] sticky-note.png
More options to explore
autotrace
You can streamline the above procedure (even transform it into an automatically working script if you want) even more, by converting the intermediate "canny-edges.png" into an SVG vector graphic, for example by running it through autotrace...
This could be useful if your sticky note is tilted or rotated.
Hough Line Detection
Once you have the "canny" lines, you could also apply the Hough Line Detection algorithm on them:
convert \
canny-edges.png \
-background black \
-stroke red \
-hough-lines 5x5+20 \
lines.png
Note that the -hough-lines operator extends and draws detected lines from one edge (with floating point values) to another edge of the original image.
While the previous command finally converted the lines to a PNG the -hough-lines operator really generates an MVG file (Magick Vector Graphics) internally. That means you could actually read the source code of the MVG file, and determine the mathematical parameters of each line which is depicted in the "red lines" image:
convert \
canny-edges.png \
-hough-lines 5x5+20 \
lines.mvg
This is more sophisticated and also works for edges which are not strictly horizontal and/or vertical.
But your example image does use horizontal and vertical edges, so you can even use simple shell commands to discover these.
There are 80 line descriptions in total in the generated MVG file. You can identify all horizontal lines in that file:
cat lines.mvg \
| while read a b c d e ; do \
if [ x${b/0,/} == x${c/400,/} ]; then \
echo "$a $b $c $d $e" ; \
fi; \
done
line 0,39.5 400,39.5 # 249
line 0,62.5 400,62.5 # 48
line 0,71.5 400,71.5 # 52
line 0,231.5 400,231.5 # 249
Now identify all vertical lines:
cat lines.mvg \
| while read a b c d e; do \
if [ x${b/,0/} == x${c/,300} ]; then \
echo "$a $b $c $d $e" ; \
fi; \
done
line 76.5,0 76.5,300 # 193
line 324.5,0 324.5,300 # 193
I have met similar problem of detecting image borders (whitespaces) last week and spent many hours trying various approaches and tools and after that finally solved it using entropy difference calculation approach, so JFYI here is algorithm.
Let's assume you want to detect if your 200x100px image has border at the top:
Get upper piece of image 25% height (25px) (0: 25, 0: 200)
Get lower piece with the same height starting from upper piece end and deeper to image center (25: 50, 0: 200)
upper and lower pieces depicted
Calculate entropies for the both pieces
Find entropy difference and store it with current block height
Make upper piece 1px less (24 px) and repeat from p.2 until we hit image edge (height 0) - resizing scan area every iteration thus sliding up to the image edge
Find maximum of the stored entropy differences and its block height - this is center of our border if it lies closer to the edge rather than to the center of image and maximum entropy difference is higher than pre-set threshold (0.5 for example)
And apply this algorithm to every side of your image.
Here is a piece of code to detect if image has top border and find its approximate coordinate (offset from top), pass grayscale ('L' mode) Pillow image to the scan function:
import numpy as np
MEDIAN = 0.5
def scan(im):
w, h = im.size
array = np.array(im)
center_ = None
diff_ = None
for center in reversed(range(1, h // 4 + 1)):
upper = entropy(array[0: center, 0: w].flatten())
lower = entropy(array[center: 2 * center, 0: w].flatten())
diff = upper / lower if lower != 0.0 else MEDIAN
if center_ is None or diff_ is None:
center_ = center
diff_ = diff
if diff < diff_:
center_ = center
diff_ = diff
top = diff_ < MEDIAN and center_ < h // 4, center_, diff_
Full source with examples of bordered and clear (not bordered) images processed is here: https://github.com/embali/enimda/
Given two images:
image1.jpg
image2.jpg
What's a fast way to detect if they are visually identical in Python? For example, they may have different EXIF data which would yield different checksums, even though the image data is the same).
Imagemagick has an excellent tool, "identify," that produces a visual hash of an image, but it's very processor intensive.
Using PIL/Pillow:
from PIL import Image
im1 = Image.open('image1.jpg')
im2 = Image.open('image2.jpg')
if list(im1.getdata()) == list(im2.getdata()):
print "Identical"
else:
print "Different"
I'm still submitting my way to tackle this -- even if the OP says that ImageMagick's way is too processor intensive (and even though my way does not involve Python)... Maybe my answer is useful to other people then, arriving at this page via search engine.
Be aware that any image comparison which is supposed to discover fine differences in hi-res images is more processor intensive than a discovery of big differences in low-res images, as it has to compare a lot more pixels.
Visualization of Differences
Here is an ImageMagick command that compares two (same-sized!) images, and returns all differing pixels as red, identical pixels as white. The first one has the reference image as a faded out background image for the composition of the red-white pixel matrix. .img may be any of the IM-supported formats (.png, .PnG, .pNG, .PNG, .jpg, .jpeg, .jPeG, .tif, .tiff, .ppm, .gif, .pdf, ...):
compare reference.img similar.img delta.img
compare reference.img similar.img -compose src delta.img
By default, the comparison is made at 72 PPI. If you need more resolution (like, with a vector based image, such as a PDF page), you can add -density to increase it. Of course, the processing time will increase accordingly:
compare -density 300 reference.img similar.img delta.img
If you add a fuzz factor, you can tell ImageMagick to treat all pixels as identical which are no more than a certain color distance apart:
compare -fuzz '3%' reference.img similar.img -compose src delta.img
pHash-ed difference value
More recent versions of ImageMagick support the phash algorithm:
compare -metric phash reference.img similar.img -compose src delta.img
This will, besides creating the delta.img for visualization, return a numeric value that indicates the "difference" between two images. The closer it is to 0, the more similar are the two images compared.
Examples:
Create a few small PDF pages with minor differences in them. I'm using Ghostscript:
gs -o ref1.pdf -sDEVICE=pdfwrite -g1050x1350 \
-c "/Courier findfont 160 scalefont setfont 10.0 10.0 moveto (0) show showpage"
gs -o ref2.pdf -sDEVICE=pdfwrite -g1050x1350
-c "/Courier findfont 160 scalefont setfont 10.1 10.1 moveto (0) show showpage"
gs -o ref3.pdf -sDEVICE=pdfwrite -g1050x1350 \
-c "/Courier findfont 160 scalefont setfont 10.0 10.0 moveto (O) show showpage"
gs -o ref4.pdf -sDEVICE=pdfwrite -g1050x1350 \
-c "/Courier findfont 160 scalefont setfont 10.1 10.1 moveto (O) show showpage"
Now compare ref1.pdf with ref3.pdf at the default resolution of 72 PPI:
compare -metric phash ref1.pdf ref3.pdf delta-ref1-ref3.pdf
7.61662
The returned pHash value is 7.61662. This indicates that ImageMagick's compare discovered at least some differences.
Let's look at the visualization. I'll create a side-by-side visualization of the three PDFs/images (to be shown below):
convert \
-mattecolor blue \
\( ref1.pdf -frame 2x2 \) \
null: \
\( ref3.pdf -frame 2x2 \) \
null: \
\( delta-ref1-ref3.pdf -frame 2x2 \) \
+append \
ref1-ref3-delta.png
As you can see, the different shapes of the 0 (digit 'zero') and the O (letter o, capital version) are standing out quite well.
Now the next one: where ref1.pdf is compared to ref2.pdf, also at 72 PPI.
compare -metric phash ref1.pdf ref2.pdf delta-ref1-ref2.pdf
0
The returned pHash value now is 0. This indicates that ImageMagick discovered no difference!
Create a side-by-side visualization of the three PDFs/images:
convert \
-mattecolor blue \
\( ref1.pdf -frame 2x2 \) \
null: \
\( ref2.pdf -frame 2x2 \) \
null: \
\( delta-ref1-ref2.pdf -frame 2x2 \) \
+append \
ref1-ref2-delta.png
As you can see, at 72 PPI ImageMagick does not discover a difference between the two PDFs (as would be indicated by red pixels). According to the Ghostscript command, both show the digit 0, but at positions which are shifted by 0.1 pt apart in x- and y-directions. So in reality, in the original PDF, there IS a difference. But when rendered at 72 PPI, this difference isn't visible.
Let's try to see the difference with density 600 then:
compare \
-metric phash \
-density 600 \
ref1.pdf \
ref2.pdf \
ref1-ref2-at-density600-delta.png
0.00172769
The returned pHash value at 600 PPI now is 0.00172769. This is close to zero, but still a difference. The difference is less than the one between ref1.pdf and ref3.pdf.
The difference is clearly highlighted now in the visual comparison, even though only by a thin line of red pixels:
Using https://github.com/andrewekhalel/sewar to compare image similar
> from sewar.full_ref import uqi
> uqi(img1,img2)
0.9586952304831419
One way to do that in Python/OpenCV is to get the absdiff, then get the mean (average) of the absdiff over the whole absdiff image.
Input1 (PNG):
Input2 (JPG):
import cv2
import numpy as np
# read image 1
img1 = cv2.imread('lena.png')
# read image 2
img2 = cv2.imread('lena.jpg')
# do absdiff
diff = cv2.absdiff(img1,img2)
# get mean of absdiff
mean_diff = np.mean(diff)
# print result
print(mean_diff)
1.8992767333984375
Just because no one has mentioned it yet, Spatial CIELAB is another useful image similarity metric.
It's simpler than it sounds: you blur the two images by an amount related to the acuity of your observer, then find the CIELAB difference (delta E). You can take the peak or average of the difference image, depending on your application.
Using pyvips, you could write:
#!/usr/bin/python3
import sys
import pyvips
# the access hint means these images can be streamed in parallel rather
# than fully decoded
image1 = pyvips.Image.new_from_file(sys.argv[1], access="sequential")
image2 = pyvips.Image.new_from_file(sys.argv[2], access="sequential")
# blur by an amount related to the visual acuity of the observer -- this will
# help remove peaks caused by small alignment differences, then take the
# CIELAB76 colour difference
sigma = 3.0
# diff = image1.gaussblur(sigma).dE76(image2.gaussblur(sigma))
diff = image1.resize(1.0 / sigma).dE76(image2.resize(1.0 / sigma))
# compute the peak difference ... over perhaps 20 means a visible difference
print(f"peak difference of {diff.max()} visual units")
As a small optimization, resizing rather than blurring reduces the number of pixels you need to compute the colour difference for.
This PC will compute a difference for a pair of 6k x 4k JPGs in about 400ms.
$ vipsheader ~/pics/theo.jpg
/home/john/pics/theo.jpg: 6048x4032 uchar, 3 bands, srgb, jpegload
$ time ./try51.py ~/pics/theo2.jpg ~/pics/theo.jpg
peak difference of 0.0 visual units
real 0m0.396s
user 0m0.952s
sys 0m0.197s