How to calculate SNR for dynamic range?

How to calculate SNR for dynamic range? - python

I am an absolute beginner when it comes to coding, so I don't really understand how to go about this. I know this question has been asked before, but I couldn't figure it out after checking those out. I hope someone can help me. Now my ultimate goal is to find out what dynamic range the arri amira actually uses. That is part of our project at uni. (specs say 14+) I have shots ranging from 0,5 to 22 f-stops. I also have a shot with the lens cap on (darkframe, which is not 100% black, because there is noise). All frames are saved as TIFF (that's a requirement in our project). Now I would first have to calculate the p(noise) of the darkframe, right? SNR = P(signal)/p(noise). And then use that value as p(noise) for the other images?
I also don't know how to calculate p(noise). My idea is to compare the darkframe image to a darkframe that is actually completely black. Maybe as an array filled with zeros? I found code that included the calculation of mse values. Would that work with my idea?
What I tried was this:
realDarkframe= np.zeros( (1080,1920,3) )
darkframeOfAmira= cv2.imread("image.tif",1)
def snr(img1, img2):
mse = np.mean( (img1 - img2) ** 2)
return mse
d=snr(realDarkframe,darkframeOfAmira)
print(d)
that returns the value of 565.7092896412037. Now I don't know what that means. And I also don't know how to calculate p(signal) even if this was correct and the number 565.7092896412037 actually equals p(noise).

The dynamic range of a measurement device (i.e. camera) describes the ratio between the maximum and minimum measurable light intensities. Regarding this, two issues emerge in your implementation.
The dark Frame image that determines the lower bound of the dynamic range should not be plain zeros. This would imply an infinite dynamic range (dividing by zero in the SNR formula). Instead you need to capture a dark shot as you suggest
"I also have a shot with the lens cap on (darkframe, which is not 100%
black, because there is noise)."
and use this image as a reference. You also need an image captured at maximum intensity (total white - "burnt")
The signal-to-noise computation formula in the code segment is the mean squared error. You should use the formula as described in wikipedia snr = P(signal)/p(noise) = mean(signal^2) / mean(noise^2).
This code should work:
darkframeOfAmira = cv2.imread("dark_image.tif",1)
brightframeOfAmira= cv2.imread("image.tif",1)
def snr(img1, img2):
return np.mean(img1 ** 2) / (np.mean(img2 ** 2) + 1e-12)
d=snr(brightframeOfAmira,darkframeOfAmira)
print(d)

Related

how to deal with negative values when subtracting current pixel with the mean

I am subtracting the mean value of the image to each pixel using the following code (hopefully I am doing it right):
for filename in glob.glob('video//frames//*.png'):
img = cv2.imread(filename,0)
values_list.append(img[150,:]) #Get all rows at x-axis 17 which is the row pixels
values_mean.append(np.round(np.mean(img[150,:]), decimals=0))
output = np.array(values_list)
values_mean = np.array(values_mean).reshape(-1,1)
new_column_value = values_mean - output
When I plot the graph, I get
What is the best way to deal with negative values? Should I simply add an if statement if goes above 255 or below 0 to make it 0? but them someone mentioned "...you kill information on where the negatives are.." so how to deal with this correctly ?
I intend to calculate the shift value between frames by getting the maximal correlation value of the subtracted image, and comparing it to the adjacent frame
There are countless similar questions, but cannot find a solid ground here, here, here, here....etc

If you're trying to determine "how far away from the mean is a given pixel", then wouldn't it make more sense to take the absolute value of your result?
new_column_value = np.absolute(values_mean - output)

Problems with using a rough greyscale algorithm?

So I'm designing a few programs for editing photos in python using PIL and one of them was converting an image to greyscale (I'm avoiding the use of any functions from PIL).
The algorithm I've employed is simple: for each pixel (colour-depth is 24), I've calculated the average of the R, G and B values and set the RGB values to this average.
My program was producing greyscale images which seemed accurate, but I was wondering if I'd employed the correct algorithm, and I came across this answer to a question, where it seems that the 'correct' algorithm is to calculate 0.299 R + 0.587 G + 0.114 B.
I decided to compare my program to this algorithm. I generated a greyscale image using my program and another one (using the same input) from a website online (the top Google result for 'image to grayscale'.
To my naked eye, it seemed that they were exactly the same, and if there was any variation, I couldn't see it. However, I decided to use this website (top Google result for 'compare two images online') to compare my greyscale images. It turned out that deep in the pixels, they had slight variations, but none which were perceivable to the human eye at a first glance (differences can be spotted, but usually only when the images are laid upon each other or switched between within milliseconds).
My Questions (the first is the main question):
Are there any disadvantages to using my 'rough' greyscale algorithm?
Does anyone have any input images where my greyscale algorithm would produce a visibly different image to the one that would be 'correct' ?
Are there any colours/RBG combinations for which my algorithm won't work as well?
My key piece of code (if needed):
def greyScale(pixelTuple):
return tuple([round(sum(pixelTuple) / 3)] * 3)
The 'correct' algorithm (which seems to heavily weight green):
def greyScale(pixelTuple):
return tuple([round(0.299 * pixelTuple[0] + 0.587 * pixelTuple[1] + 0.114 * pixelTuple[2])] * 3)
My input image:
The greyscale image my algorithm produces:
The greyscale image which is 'correct':
When the greyscale images are compared online (highlighted red are the differences, using a fuzz of 10%):
Despite the variations in pixels highlighted above, the greyscale images above appear as nearly the exact same (at least, to me).
Also, regarding my first question, if anyone's interested, this site has done some analysis on different algorithms for conversions to greyscale and also has some custom algorithms.
EDIT:
In response to #Szulat's answer, my algorithm actually produces this image instead (ignore the bad cropping, the original image had three circles but I only needed the first one):
In case people are wondering what the reason for converting to greyscale is (as it seems that the algorithm depends on the purpose), I'm just making some simple photo editing tools in python so that I can have a mini-Photoshop and don't need to rely on the Internet to apply filters and effects.
Reason for Bounty: Different answers here are covering different things, which are all relevant and helpful. This makes it quite difficult to choose which answer to accept. I've started a bounty because I like a few answers listed here, but also because it'd be nice to have a single answer which covers everything I need for this question.

The images look pretty similar, but your eye can tell the difference, specially if you put one in place of the other:
For example, you can note that the flowers in the background look brighter in the averaging conversion.
It is not that there is anything intrinsically "bad" about averaging the three channels. The reason for that formula is that we do not perceive red, green and blue equally, so their contributions to the intensities in a grayscale image shouldn't be the same; since we perceive green more intensely, green pixels should look brighter on grayscale. However, as commented by Mark there is no unique perfect conversion to grayscale, since we see in color, and in any case everyone's vision is slightly different, so any formula will just try to make an approximation so pixel intensities feel "right" for most people.

The most obvious example:
Original
Desaturated in Gimp (Lightness mode - this is what your algorithm does)
Desaturated in Gimp (Luminosity mode - this is what our eyes do)
So, don't average RGB. Averaging RGB is simply wrong!
(Okay, you're right, averaging might be valid in some obscure applications, even though it has no physical or physiological meaning when RGB values are treated as color. By the way, the "regular" way of doing weighted averaging is also incorrect in a more subtle way because of gamma. sRGB should be first linearized and then the final result converted back to sRGB (which would be equivalent of retrieving the L component in the Lab color space))

You can use any conversion equation, scale, linearity. The one you found:
I = 0.299 R + 0.587 G + 0.114 B
is based on average human eye "average" primary color (R,G,B) perception sensitivity (at least for the time period and population/HW it was created on; bear in mind those standards were created before LED,TFT, etc. screens).
There are several problems you are fighting against:
our eyes are not the same
All humans do not perceive color the same way. There are major discrepancies between genders and smaller also between regions; even generation and age play a role. So even an average should be handled as "average".
We have different sensitivity to intensity of light across the visible spectrum. The most sensitive color is green (hence the highest weight on it). But the XYZ curve peaks can be at different wavelengths for different people (like me I got them shifted a bit causing difference in recognition of certain wavelengths like some shades of Aqua - some see them as green some as blue even if none of them have any color blindness disabilities or whatever).
monitors do not use the same wavelengths nor spectral dispersion
So if you take 2 different monitors, they might use slightly different wavelengths for R, G, B or even different widths of the spectral filter (just use a spectroscope and see). Yes they should be "normalized" by the HW but that is not the same as using normalized wavelengths. It is similar to problems using RGB vs. White Noise spectrum light sources.
monitor linearity
Humans do not see on a linear scale: we are usually logarithmic/exponential (depends how you look at it) so yes we can normalize that with HW (or even SW) but the problem is if we linearize for one human then means we damage it for another.
If you take all this together you can either use averages ... or special (and expensive) equipment to measure/normalize against some standard or against a calibrated person (depends on the industry).
But that is too much to handle in home conditions so leave all that for industry and use the weights for "average" like most of the world... Luckily our brain can handle it as you cannot see the difference unless you start comparing both images side by side or in an animation :). So I (would) do:
I = 0.299 R + 0.587 G + 0.114 B
R = I
G = I
B = I

There are many different methods for converting to greyscale, and they do give different results though the differences might be easier to see with different input colour images.
As we don't really see in greyscale, the "best" method is somewhat dependent on the application and somewhat in the eye of the beholder.
The alternative formula you refer to is based on the human eye being more sensitive to variations in green tones and therefore giving them a bigger weighting - similarly to a Bayer array in a camera where there are 2 green pixels for each red and blue one. Wiki - Bayer array

There are many formulas for the Luminance, depending on the R,G,B color primaries:
Rec.601/NTSC: Y = 0.299*R + 0.587*G + 0.114*B ,
Rec.709/EBU: Y = 0.213*R + 0.715*G + 0.072*B ,
Rec.2020/UHD: Y = 0.263*R + 0.678*G + 0.059*B .
This is all because our eyes are less sensitive to blue than to red than to green.
That being said, you are probably calculating Luma, not Luminance, so the formulas are all wrong anyway. For Constant-Luminance you must convert to linear-light
R = R' ^ 2.4 , G = G' ^ 2.4 , B = B' ^ 2.4 ,
apply the Luminance formula, and convert back to the gamma domain
Y' = Y ^ (1/2.4) .
Also, consider that converting a 3D color space to a 1D quantity loses 2/3 of the information, which can bite you in the next processing steps. Depending on the problem, sometimes a different formula is better, like V = MAX(R,G,B) (from HSV color space).
How do I know? I'm a follower and friend of Dr. Poynton.

The answers provided are enough, but I want to discuss a bit more on this topic in a different manner.
Since I learnt digital painting for interest, more often I use HSV.
It is much more controllable for using HSV during painting, but keep it short, the main point is the S: Saturation separating the concept of color from the light. And turning S to 0, is already the 'computer' grey scale of image.
from PIL import Image
import colorsys
def togrey(img):
if isinstance(img,Image.Image):
r,g,b = img.split()
R = []
G = []
B = []
for rd,gn,bl in zip(r.getdata(),g.getdata(),b.getdata()) :
h,s,v = colorsys.rgb_to_hsv(rd/255.,gn/255.,bl/255.)
s = 0
_r,_g,_b = colorsys.hsv_to_rgb(h,s,v)
R.append(int(_r*255.))
G.append(int(_g*255.))
B.append(int(_b*255.))
r.putdata(R)
g.putdata(G)
b.putdata(B)
return Image.merge('RGB',(r,g,b))
else:
return None
a = Image.open('../a.jpg')
b = togrey(a)
b.save('../b.jpg')
This method truly reserved the 'bright' of original color. However, without considering how human eye process the data.

In answer to your main question, there are disadvantages in using any single measure of grey. It depends on what you want from your image. For example, if you have colored text on white background, if you want to make the text stand out you can use the minimum of the r, g, b values as your measure. But if you have black text on a colored background, you can use the maximum of the values for the same result. In my software I offer the option of max, min or median value for the user to choose. The results on continuous tone images are also illuminating.
In response to comments asking for more details, the code for a pixel is below (without any defensive measures).
int Ind0[3] = {0, 1, 2}; //all equal
int Ind1[3] = {2, 1, 0}; // top, mid ,bot from mask...
int Ind2[3] = {1, 0, 2};
int Ind3[3] = {1, 2, 0};
int Ind4[3] = {0, 2, 1};
int Ind5[3] = {2, 0, 1};
int Ind6[3] = {0, 1, 2};
int Ind7[3] = {-1, -1, -1}; // not possible
int *Inds[8] = {Ind0, Ind1, Ind2, Ind3, Ind4, Ind5, Ind6, Ind7};
void grecolor(unsigned char *rgb, int bri, unsigned char *grey)
{ //pick out bot, mid or top according to bri flag
int r = rgb[0];
int g = rgb[1];
int b = rgb[2];
int mask = 0;
mask |= (r > g);
mask <<= 1;
mask |= (g > b);
mask <<= 1;
mask |= (b > r);
grey[0] = rgb[Inds[mask][2 - bri]]; // 2, 1, 0 give bot, mid, top
}

How can I extract this obvious event from this image?

EDIT: I have found a solution :D thanks for the help.
I've created an image processing algorithm which extracts this image from the data. It's complex, so I won't go into detail, but this image is essentially a giant numpy array (it's visualizing angular dependence of pixel intensity of an object).
I want to write a program which automatically determines when the curves switch direction. I have the data and I also have this image, but it turns out doing something meaningful with either has been tricky. Thresholding fails because there are bands of different background color. Sobel operators and Hough Transforms also do not work well for this same reason.
This is really easy for humans to see when this switch happens, but not so easy to tell a computer. Any tips? Thanks!
Edit: Thanks all, I'm now fitting lines to this image after convolution with general gaussian and skeletonization of the result. Any pointers on doing this would be appreciated :)

You can take a weighted dot product of successive columns to get a one-dimensional signal that is much easier to work with. You might be able to extract the patterns using this signal:
import numpy as np
A = np.loadtxt("img.txt")
N = A.shape[0]
L = np.logspace(1,2,N)
X = []
for c0,c1 in zip(A.T, A.T[1:]):
x = c0.dot(c1*L) / (np.linalg.norm(c0)*np.linalg.norm(c1))
X.append(x)
X = np.array(X)
import pylab as plt
plt.matshow(A,alpha=.5)
plt.plot(X*3-X.mean(),'k',lw=2)
plt.axis('tight')
plt.show()
This is absolutely not a complete answer to the question, but a useful observation that is too long for a comment. I'll delete if a better answer comes along.

With the help of Mark McCurry, I was able to get a good result.
Step 1: Load original image. Remove background by subtracting median of each vertical column from itself.
no_background=[]
for i in range(num_frames):
no_background.append(orig[:,i]-np.median(orig,1))
no_background=np.array(no_background).T
Step 2: Change negative values to 0.
clipped_background = no_background.clip(min=0)
Step 3: Extract a 1D signal. Take weighted sum of the vertical columns, which relates the max intensity in a column to its position.
def exp_func(x):
return np.dot(np.arange(len(x)), np.power(x, 10))/(np.sum(np.power(x, 10)))
weighted_sum = np.apply_along_axis(exp_func,0, clipped_background)
Step 4: Take the derivative of 1D signal.
conv = np.convolve([-1.,1],weighted_sum, mode='same')
pl.plot(conv)
Step 5: Determine when the derivative changes sign.
signs=np.sign(conv)
pl.plot(signs)
pl.ylim(-1.2,1.2)
Step 6: Apply median filter to above signal.
filtered_signs=median_filter(signs, 5) #pick window size based on result. second arg and odd number.
pl.plot(filtered_signs)
pl.ylim(-1.2,1.2)
Step 7: Find the indices (frame locations) of when the sign switches. Plot result.
def sign_switch(oneDarray):
inds=[]
for ind in range(len(oneDarray)-1):
if (oneDarray[ind]<0 and oneDarray[ind+1]>0) or (oneDarray[ind]>0 and oneDarray[ind+1]<0):
inds.append(ind)
return np.array(inds)
switched_frames = sign_switch(filtered_signs)

For detecting tip positions or turning points, you might try using a corner detector on the original image (not the skeletonized one). As a corner detector the structure tensor could be applicable. The structure tensor is also useful for calculating the local orientation in an image.

recognize very low level picture and color it in plain black in Python

I'm working on an automated "bug count" algorithm and I'm wondering how I can recognize a very low level of contrast (no additional bugs on pad - assume camera is at the same position and ligh conditions very similar) if I subtract two pics from each other, i won't get a clean black image due to the fact the light conditions will be minimally different. Now I apply a gaussian filter and mahotas.thresholding.otsu() to draw out where there is a bug (put a white blob over top) and then I use ndimage.label() to count them.
However, if my input image "cropbugs.jpg" is very dark grey, I get some random output after applying otsu() and my label() function returns a random bug count number. How do I recognize if my image is very dark grey or low in contrast and just set bug count to 0?
Thanks
My code so far looks like:
bug_img = mahotas.imread('cropbugs.jpg')
pylab.gray()
bug_img = ndimage.gaussian_filter(bug_img, 6)#8
T = mahotas.thresholding.otsu(bug_img)
pylab.imshow(bug_img > T)
labeled,nr_objects = ndimage.label(bug_img > T)
print "Bug Count: "+str(nr_objects)
pylab.imshow(labeled)
pylab.jet()
pylab.show()

I can see multiple ways of approaching this problem
(This was the suggestion in the comments).
Define some fixed rule of the type based on the mean value, the standard deviation, the maximum value or some combination. You will end up with a test like one of the following:
bug_img.mean() + 2*bug_img.std() < THRESHOLD
bug_img.std() < THRESHOLD
bug_img.max() < THRESHOLD
sorted(bug_img.ravel())[-10] < THRESHOLD
Use a classification system based on texture features (see my answer to a related question earlier).
Go ahead and use label as if everything was good and then post-filter the results. For example:
labeled,nr_objects = mahotas.label(bug_img > T)
sizes = mahotas.labeled.labeled_size(labeled)
good_objects = (MIN_BUG_SIZE <= sizes) & (sizes <= MAX_BUG_SIZE)
print np.sum(good_objects)

Image comparison algorithm

I'm trying to compare images to each other to find out whether they are different. First I tried to make a Pearson correleation of the RGB values, which works also quite good unless the pictures are a litte bit shifted. So if a have a 100% identical images but one is a little bit moved, I get a bad correlation value.
Any suggestions for a better algorithm?
BTW, I'm talking about to compare thousand of imgages...
Edit:
Here is an example of my pictures (microscopic):
im1:
im2:
im3:
im1 and im2 are the same but a little bit shifted/cutted, im3 should be recognized as completly different...
Edit:
Problem is solved with the suggestions of Peter Hansen! Works very well! Thanks to all answers! Some results can be found here
http://labtools.ipk-gatersleben.de/image%20comparison/image%20comparision.pdf

A similar question was asked a year ago and has numerous responses, including one regarding pixelizing the images, which I was going to suggest as at least a pre-qualification step (as it would exclude very non-similar images quite quickly).
There are also links there to still-earlier questions which have even more references and good answers.
Here's an implementation using some of the ideas with Scipy, using your above three images (saved as im1.jpg, im2.jpg, im3.jpg, respectively). The final output shows im1 compared with itself, as a baseline, and then each image compared with the others.
>>> import scipy as sp
>>> from scipy.misc import imread
>>> from scipy.signal.signaltools import correlate2d as c2d
>>>
>>> def get(i):
... # get JPG image as Scipy array, RGB (3 layer)
... data = imread('im%s.jpg' % i)
... # convert to grey-scale using W3C luminance calc
... data = sp.inner(data, [299, 587, 114]) / 1000.0
... # normalize per http://en.wikipedia.org/wiki/Cross-correlation
... return (data - data.mean()) / data.std()
...
>>> im1 = get(1)
>>> im2 = get(2)
>>> im3 = get(3)
>>> im1.shape
(105, 401)
>>> im2.shape
(109, 373)
>>> im3.shape
(121, 457)
>>> c11 = c2d(im1, im1, mode='same') # baseline
>>> c12 = c2d(im1, im2, mode='same')
>>> c13 = c2d(im1, im3, mode='same')
>>> c23 = c2d(im2, im3, mode='same')
>>> c11.max(), c12.max(), c13.max(), c23.max()
(42105.00000000259, 39898.103896795357, 16482.883608327804, 15873.465425120798)
So note that im1 compared with itself gives a score of 42105, im2 compared with im1 is not far off that, but im3 compared with either of the others gives well under half that value. You'd have to experiment with other images to see how well this might perform and how you might improve it.
Run time is long... several minutes on my machine. I would try some pre-filtering to avoid wasting time comparing very dissimilar images, maybe with the "compare jpg file size" trick mentioned in responses to the other question, or with pixelization. The fact that you have images of different sizes complicates things, but you didn't give enough information about the extent of butchering one might expect, so it's hard to give a specific answer that takes that into account.

I have one done this with an image histogram comparison. My basic algorithm was this:
Split image into red, green and blue
Create normalized histograms for red, green and blue channel and concatenate them into a vector (r0...rn, g0...gn, b0...bn) where n is the number of "buckets", 256 should be enough
subtract this histogram from the histogram of another image and calculate the distance
here is some code with numpy and pil
r = numpy.asarray(im.convert( "RGB", (1,0,0,0, 1,0,0,0, 1,0,0,0) ))
g = numpy.asarray(im.convert( "RGB", (0,1,0,0, 0,1,0,0, 0,1,0,0) ))
b = numpy.asarray(im.convert( "RGB", (0,0,1,0, 0,0,1,0, 0,0,1,0) ))
hr, h_bins = numpy.histogram(r, bins=256, new=True, normed=True)
hg, h_bins = numpy.histogram(g, bins=256, new=True, normed=True)
hb, h_bins = numpy.histogram(b, bins=256, new=True, normed=True)
hist = numpy.array([hr, hg, hb]).ravel()
if you have two histograms, you can get the distance like this:
diff = hist1 - hist2
distance = numpy.sqrt(numpy.dot(diff, diff))
If the two images are identical, the distance is 0, the more they diverge, the greater the distance.
It worked quite well for photos for me but failed on graphics like texts and logos.

You really need to specify the question better, but, looking at those 5 images, the organisms all seem to be oriented the same way. If this is always the case, you can try doing a normalized cross-correlation between the two images and taking the peak value as your degree of similarity. I don't know of a normalized cross-correlation function in Python, but there is a similar fftconvolve() function and you can do the circular cross-correlation yourself:
a = asarray(Image.open('c603225337.jpg').convert('L'))
b = asarray(Image.open('9b78f22f42.jpg').convert('L'))
f1 = rfftn(a)
f2 = rfftn(b)
g = f1 * f2
c = irfftn(g)
This won't work as written since the images are different sizes, and the output isn't weighted or normalized at all.
The location of the peak value of the output indicates the offset between the two images, and the magnitude of the peak indicates the similarity. There should be a way to weight/normalize it so that you can tell the difference between a good match and a poor match.
This isn't as good of an answer as I want, since I haven't figured out how to normalize it yet, but I'll update it if I figure it out, and it will give you an idea to look into.

If your problem is about shifted pixels, maybe you should compare against a frequency transform.
The FFT should be OK (numpy has an implementation for 2D matrices), but I'm always hearing that Wavelets are better for this kind of tasks ^_^
About the performance, if all the images are of the same size, if I remember well, the FFTW package created an specialised function for each FFT input size, so you can get a nice performance boost reusing the same code... I don't know if numpy is based on FFTW, but if it's not maybe you could try to investigate a little bit there.
Here you have a prototype... you can play a little bit with it to see which threshold fits with your images.
import Image
import numpy
import sys
def main():
img1 = Image.open(sys.argv[1])
img2 = Image.open(sys.argv[2])
if img1.size != img2.size or img1.getbands() != img2.getbands():
return -1
s = 0
for band_index, band in enumerate(img1.getbands()):
m1 = numpy.fft.fft2(numpy.array([p[band_index] for p in img1.getdata()]).reshape(*img1.size))
m2 = numpy.fft.fft2(numpy.array([p[band_index] for p in img2.getdata()]).reshape(*img2.size))
s += numpy.sum(numpy.abs(m1-m2))
print s
if __name__ == "__main__":
sys.exit(main())
Another way to proceed might be blurring the images, then subtracting the pixel values from the two images. If the difference is non nil, then you can shift one of the images 1 px in each direction and compare again, if the difference is lower than in the previous step, you can repeat shifting in the direction of the gradient and subtracting until the difference is lower than a certain threshold or increases again. That should work if the radius of the blurring kernel is larger than the shift of the images.
Also, you can try with some of the tools that are commonly used in the photography workflow for blending multiple expositions or doing panoramas, like the Pano Tools.

I have done some image processing course long ago, and remember that when matching I normally started with making the image grayscale, and then sharpening the edges of the image so you only see edges. You (the software) can then shift and subtract the images until the difference is minimal.
If that difference is larger than the treshold you set, the images are not equal and you can move on to the next. Images with a smaller treshold can then be analyzed next.
I do think that at best you can radically thin out possible matches, but will need to personally compare possible matches to determine they're really equal.
I can't really show code as it was a long time ago, and I used Khoros/Cantata for that course.

First off, correlation is a very CPU intensive rather inaccurate measure for similarity. Why not just go for the sum of the squares if differences between individual pixels?
A simple solution, if the maximum shift is limited: generate all possible shifted images and find the one that is the best match. Make sure you calculate your match variable (i.e. correllation) only over the subset of pixels that can be matched in all shifted images. Also, your maximum shift should be significantly smaller than the size of your images.
If you want to use some more advances image processing techniques I suggest you look at SIFT this is a very powerfull method that (theoretically anyway) can properly match items in images independent of translation, rotation and scale.

I guess you could do something like this:
estimate vertical / horizontal displacement of reference image vs the comparison image. a
simple SAD (sum of absolute difference) with motion vectors would do to.
shift the comparison image accordingly
compute the pearson correlation you were trying to do
Shift measurement is not difficult.
Take a region (say about 32x32) in comparison image.
Shift it by x pixels in horizontal and y pixels in vertical direction.
Compute the SAD (sum of absolute difference) w.r.t. original image
Do this for several values of x and y in a small range (-10, +10)
Find the place where the difference is minimum
Pick that value as the shift motion vector
Note:
If the SAD is coming very high for all values of x and y then you can anyway assume that the images are highly dissimilar and shift measurement is not necessary.

To get the imports to work correctly on my Ubuntu 16.04 (as of April 2017), I installed python 2.7 and these:
sudo apt-get install python-dev
sudo apt-get install libtiff5-dev libjpeg8-dev zlib1g-dev libfreetype6-dev liblcms2-dev libwebp-dev tcl8.6-dev tk8.6-dev python-tk
sudo apt-get install python-scipy
sudo pip install pillow
Then I changed Snowflake's imports to these:
import scipy as sp
from scipy.ndimage import imread
from scipy.signal.signaltools import correlate2d as c2d
How awesome that Snowflake's scripted worked for me 8 years later!

I propose a solution based on the Jaccard index of similarity on the image histograms. See: https://en.wikipedia.org/wiki/Jaccard_index#Weighted_Jaccard_similarity_and_distance
You can compute the difference in the distribution of the pixel colors. This is indeed pretty invariant to translations.
from PIL.Image import Image
from typing import List
def jaccard_similarity(im1: Image, im2: Image) -> float:
"""Compute the similarity between two images.
First, for each image an histogram of the pixels distribution is extracted.
Then, the similarity between the histograms is compared using the weighted Jaccard index of similarity, defined as:
Jsimilarity = sum(min(b1_i, b2_i)) / sum(max(b1_i, b2_i)
where b1_i, and b2_i are the ith histogram bin of images 1 and 2, respectively.
The two images must have same resolution and number of channels (depth).
See: https://en.wikipedia.org/wiki/Jaccard_index
Where it is also called Ruzicka similarity."""
if im1.size != im2.size:
raise Exception("Images must have the same size. Found {} and {}".format(im1.size, im2.size))
n_channels_1 = len(im1.getbands())
n_channels_2 = len(im2.getbands())
if n_channels_1 != n_channels_2:
raise Exception("Images must have the same number of channels. Found {} and {}".format(n_channels_1, n_channels_2))
assert n_channels_1 == n_channels_2
sum_mins = 0
sum_maxs = 0
hi1 = im1.histogram() # type: List[int]
hi2 = im2.histogram() # type: List[int]
# Since the two images have the same amount of channels, they must have the same amount of bins in the histogram.
assert len(hi1) == len(hi2)
for b1, b2 in zip(hi1, hi2):
min_b = min(b1, b2)
sum_mins += min_b
max_b = max(b1, b2)
sum_maxs += max_b
jaccard_index = sum_mins / sum_maxs
return jaccard_index
With respect to mean squared error, the Jaccard index lies always in the range [0,1], thus allowing for comparisons among different image sizes.
Then, you can compare the two images, but after rescaling to the same size! Or pixel counts will have to be somehow normalized. I used this:
import sys
from skincare.common.utils import jaccard_similarity
import PIL.Image
from PIL.Image import Image
file1 = sys.argv[1]
file2 = sys.argv[2]
im1 = PIL.Image.open(file1) # type: Image
im2 = PIL.Image.open(file2) # type: Image
print("Image 1: mode={}, size={}".format(im1.mode, im1.size))
print("Image 2: mode={}, size={}".format(im2.mode, im2.size))
if im1.size != im2.size:
print("Resizing image 2 to {}".format(im1.size))
im2 = im2.resize(im1.size, resample=PIL.Image.BILINEAR)
j = jaccard_similarity(im1, im2)
print("Jaccard similarity index = {}".format(j))
Testing on your images:
$ python CompareTwoImages.py im1.jpg im2.jpg
Image 1: mode=RGB, size=(401, 105)
Image 2: mode=RGB, size=(373, 109)
Resizing image 2 to (401, 105)
Jaccard similarity index = 0.7238955686269157
$ python CompareTwoImages.py im1.jpg im3.jpg
Image 1: mode=RGB, size=(401, 105)
Image 2: mode=RGB, size=(457, 121)
Resizing image 2 to (401, 105)
Jaccard similarity index = 0.22785529941822316
$ python CompareTwoImages.py im2.jpg im3.jpg
Image 1: mode=RGB, size=(373, 109)
Image 2: mode=RGB, size=(457, 121)
Resizing image 2 to (373, 109)
Jaccard similarity index = 0.29066426814105445
You might also consider experimenting with different resampling filters (like NEAREST or LANCZOS), as they, of course, alter the color distribution when resizing.
Additionally, consider that swapping images change the results, as the second image might be downsampled instead of upsampled (After all, cropping might better suit your case rather than rescaling.)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.