Difference between images of different sizes

Difference between images of different sizes - python

My problem is as follows. I have an image img0 (array shape (A,B,3)) and then a face img1 cut out from the middle of that image (by an algorithm I don't have access to: my input is only the whole image, and the face cut out from it), now an array shaped (C,D,3) where C<A and D<B. Now, I want to perform operations on the face (e.g., colour it differently) and then stick it back inside the original background (which is not coloured differently) -- these operations will not affect the shape of img1 array containing the face alone, it will remain (C,D,3). Something like img0-img1 doesn't work because of the shape mismatch.
I guess an approach like finding the starting coordinate of the face in img0 would work in the case that the face cut out is rectangular (which is possible for me to use, though not ideal), since it is guaranteed that the face is exactly identical in img1 and img0. That means, to get the background, we only need to find the starting coordinate of the img1 array in img0, cut out the subsequent elements (that correspond to img1) from img0, and we're left with the background. After I've done whatever I want to the face, I can use the new (C,D,3) array in place of the previous img1 part of the whole image (img0).
Is there a way to do this in Python? i.e., compute the difference between two images of different sizes, where one image is a 'subimage' of the other? Or, failing that, if we can find the starting coordinate of the rectangular portion of an image (img0) which corresponds to a rectangular cutout available to us (img1)?

Or, failing that, if we can find the starting coordinate of the rectangular ?portion of an image (img0) which corresponds to a rectangular cutout available to us (img1)?
One easy way to do that would be to cross-correlate your zero-mean cut-out with the zero-mean original image. As you have no noise added to the image, any maximum of the cross-correlation is a possible candidate.
However:
(i) If you don't use faces but e.g. blocks, there will be multiple maxima and you don't have an unique solution.
(ii) It is not exactly an elegant solution to your problem.
I modified the code example from [1] to make it clearer:
from scipy import signal, misc
import numpy as np
face = misc.face(gray=True)
face = face - np.mean(face)
face_cutout = np.copy(face[300:365, 670:750])
face_cutout = face_cutout - np.mean(face_cutout)
corr = signal.correlate2d(face, face_cutout, mode='valid')
y, x = np.unravel_index(np.argmax(corr), corr.shape) # find the match
print(f'x: {x} y: {y}')
[1] https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.correlate2d.html

Related

How to blur a image in Python using Pillow

I am trying to take an image and blur it in Python using Pillow.
The only things that are imported are random and Image.
The only functions of Image I can use are open, size, load, new, close, show and save.
I know that I have to take the average RGB values of the pixels surrounding every pixel and then set that value to be the RGB value of the center pixel.
The problem is that I don't know how to get these values and look at specific edge cases where there are fewer adjacent pixels.
I am not allowed to import anything and can only use lists, loops, if statements, dictionaries, and tuples.

Take a look at this: https://en.wikipedia.org/wiki/Kernel_%28image_processing%29
Basically, you'll want to loop over the image and compute the new value for each pixel. If you can change the size of the image, then you can simply ignore the borders and therefore the edge cases. Otherwise, you'll need to apply one of the edge-handling techniques listed in the wiki page.
Good luck!!

You can use numpy.lib.stride_tricks.as_strided in order to create a window along the two spatial dimensions which can then be used to average neighboring points via mean. The following uses a (3, 3) window, i.e. only considering direct neighbors but different values are possible of course. To work on the edges, the original image first gets padded by repeating the edge values. Then the window averaging process can be repeated an arbitrary number times, depending on the desired blur factor. Here is some example code:
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
original_image = np.array(Image.open('example.jpg'))
image = np.pad(original_image, ((1,), (1,), (0,)), mode='edge')
window = np.lib.stride_tricks.as_strided(
image,
image.shape + (3, 3),
image.strides + image.strides[:2]
)[:-2, :-2]
for __ in range(10):
image[1:-1, 1:-1] = window.mean(axis=(-1, -2))
plt.title('Original')
plt.imshow(original_image)
plt.figure()
plt.title('Blurred')
plt.imshow(image[1:-1, 1:-1])
plt.show()
And here are the two image versions for comparison:
Photo by Cameron Venti on Unsplash

How does cv2.merge((r,g,b)) works?

I am trying to do a linear filter on an image with RGB colors. I found a way to do that is by splitting the image to different color layers and then merge them.
i.e.:
cv2.split(img)
Sobel(b...)
Sobel(g...)
Sobel(r...)
cv2.merge((b,g,r))
I want to find out how cv2.merge((b,g,r)) works and how the final image will be constructed.

cv2.merge takes single channel images and combines them to make a multi-channel image. You've run the Sobel edge detection algorithm on each channel on its own. You are then combining the results together into a final output image. If you combine the results together, it may not make sense visually at first but what you would be displaying are the edge detection results of all three planes combined into a single image.
Ideally, hues of red will tell you the strength of the edge detection in the red channel, hues of green giving the strength of the detection for the green channel, and finally blue hues for the strength of detection in the blue.
Sometimes this is a good debugging tool so that you can semantically see all of the edge information for each channel in a single image. However, this will most likely be very hard to interpret for very highly complicated images with lots of texture and activity.
What is more usually done is to actually do an edge detection using a colour edge detection algorithm, or convert the image to grayscale and do the detection on that image instead.
As an example of the former, one can decompose the RGB image into HSV and use the colour information in this space to do a better edge detection. See this answer by Micka: OpenCV Edge/Border detection based on color.

This is my understanding. In OpenCV the function split() will take in the paced image input (being a multi-channel array) and split it into several separate single-channel arrays.
Within an image, each pixel has a spot sequentially within an array with each pixel having its own array to denote (r,g and b) hence the term multi channel. This set up allows any type of image such as bgr, rgb, or hsv to be split using the same function.
As Example (pretend these are separate examples so no variables are being overwritten)
b,g,r = cv2.split(bgrImage)
r,g,b = cv2.split(rgbImage)
h,s,v = cv2.split(hsvImage)
Take b,g,r arrayts for example. Each is a single channel array contains a portion of the split rgb image.
This means the image is being split out into three separate arrays:
rgbImage[0] = [234,28,19]
r[0] = 234
g[0] = 28
b[0] = 19
rgbImage[41] = [119,240,45]
r[41] = 119
g[14] = 240
b[14] = 45
Merge does the reverse by taking several single channel arrays and merging them together:
newRGBImage = cv2.merge((r,g,b))
the order in which the separated channels are passed through become important with this function.
Pseudo code:
cv2.merge((r,g,b)) != cv2.merge((b,g,r))
As an aside: Cv2.split() is an expensive function and the use of numpy indexing is must more efficient.
For more information check out opencv python tutorials

Comparing and plotting regions of the same color over a dataset of a few hundred images

A chem student asked me for help with plotting image segmenetation:
A stationary camera takes a picture of the experimental setup every second over a period of a few minutes, so like 300 images yield.
The relevant parts in the setup are two adjacent layers of differently-colored foams observed from the side, a 2-color sandwich shrinking from both sides, basically, except one of the foams evaporates a bit faster.
I'd like to segment each of the images in the way that would let me plot both foam regions' "width" against time.
Here is a "diagram" :)
I want to go from here --> To here
Ideally, given a few hundred of such shots, in which only the widths change, I get an array of scalars back that I can plot. (Going to look like a harmonic series on either side of the x-axis)
I have a bit of python and matlab experience, but have never used OpenCV or Image Processing toolbox in matlab, or actually never dealt with any computer vision in general. Could you guys throw like a roadmap of what packages/functions to use or steps one should take and i'll take it from there?
I'm not sure how to address these things:
-selecting at which slice along the length of the slice the algorithm measures the width(i.e. if the foams are a bit uneven), although this can be ignored.
-which library to use to segment regions of the image based on their color, (some k-means shenanigans probably), and selectively store the spatial parameters of the resulting segments?
-how to iterate that above over a number of files.
Thank you kindly in advance!

Assume your Intensity will be different after converting into gray scale ( if not, just convert to other color space like HSV or LAB, then just use one of the components)
img = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
First, Threshold your grayscaled input into a few bands
ret,thresh1 = cv.threshold(img,128,255,cv.THRESH_BINARY)
ret,thresh2 = cv.threshold(img,27,255,cv.THRESH_BINARY_INV)
ret,thresh3 = cv.threshold(img,77,255,cv.THRESH_TRUNC)
ret,thresh4 = cv.threshold(img,97,255,cv.THRESH_TOZERO)
ret,thresh5 = cv.threshold(img,227,255,cv.THRESH_TOZERO_INV)
The value should be tested out by your actual data. Here Im just give a example
Clean up the segmented image using median filter with a radius larger than 9. I do expect some noise. You can also use ROI here to help remove part of noise. But personally I`m lazy, I just wrote program to handle all cases and angle
threshholed_images_aftersmoothing = cv2.medianBlur(threshholed_images,9)
Each band will be corresponding to one color (layer). Now you should have N segmented image from one source. where N is the number of layers you wish to track
Second use opencv function bounding rect to find location and width/height of each Layer AKA each threshholed_images_aftersmoothing. Eg. boundingrect on each sub-segmented images.
C++: Rect boundingRect(InputArray points)
Python: cv2.boundingRect(points) → retval¶
Last, the rect have x,y, height and width property. You can use a simple sorting order to sort from top to bottom layer based on rect attribute x. Run though all vieo to obtain the x(layer id) , height vs time graph.
Rect API
Public Attributes
_Tp **height** // this is what you are looking for
_Tp width
_Tp **x** // this tells you the position of the band
_Tp y
By plot the corresponding heights (|AB| or |CD|) over time, you can obtain the graph you needed.
The more correct way is to use Kalman filter to track the position and height graph as I would expect some sort of bubble will occur and will interfere with the height of the layers.
To be honest, i didnt expect a chem student to be good at this. Haha good luck
Anything wrong you can find me here or Email me if i`m not watching stackoverflow

You can select a region of interest straight down the middle of the foams, a few pixels wide. If you stack these regions for each image it will show the shrink over time.
If for example you use 3 pixel width for the roi, the result of 300 images will be a 900 pixel wide image, where the left is the start of the experiment and the right is the end. The following image can help you understand:
Though I have not fully tested it, this code should work. Note that there must only be images in the folder you reference.
import cv2
import numpy as np
import os
# path to folder that holds the images
path = '.'
# dimensions of roi
x = 0
y = 0
w = 3
h = 100
# store references to all images
all_images = os.listdir(path)
# sort images
all_images.sort()
# create empty result array
result = np.empty([h,0,3],dtype=np.uint8)
for image in all_images:
# load image
img = cv2.imread(path+'/'+image)
# get the region of interest
roi = img[y:y+h,x:x+w]
# add the roi to previous results
result = np.hstack((result,roi))
# optinal: save result as image
# cv2.imwrite('result.png',result)
# display result - can also plot with matplotlib
cv2.imshow('Result', result)
cv2.waitKey(0)
cv2.destroyAllWindows()
Update after question edit:
If the foams have different colors, your can use easily separate them by color by converting the image you hsv and using inrange (example). This creates a mask (=2D array with values from 0-255, one for each pixel) that you can use to calculate average height and extract the parameters and area of the image.
You can find a script to help you find the HSV colors for separation on this GitHub

FFT on image with Python

I have a problem with FFT implementation in Python. I have completely strange results.
Ok so, I want to open image, get value of every pixel in RGB, then I need to use fft on it, and convert to image again.
My steps:
1) I'm opening image with PIL library in Python like this
from PIL import Image
im = Image.open("test.png")
2) I'm getting pixels
pixels = list(im.getdata())
3) I'm seperate every pixel to r,g,b values
for x in range(width):
for y in range(height):
r,g,b = pixels[x*width+y]
red[x][y] = r
green[x][y] = g
blue[x][y] = b
4). Let's assume that I have one pixel (111,111,111). And use fft on all red values like this
red = np.fft.fft(red)
And then:
print (red[0][0], green[0][0], blue[0][0])
My output is:
(53866+0j) 111 111
It's completely wrong I think. My image is 64x64, and FFT from gimp is completely different. Actually, my FFT give me only arrays with huge values, thats why my output image is black.
Do you have any idea where is problem?
[EDIT]
I've changed as suggested to
red= np.fft.fft2(red)
And after that I scale it
scale = 1/(width*height)
red= abs(red* scale)
And still, I'm getting only black image.
[EDIT2]
Ok, so lets take one image.
Assume that I dont want to open it and save as greyscale image. So I'm doing like this.
def getGray(pixel):
r,g,b = pixel
return (r+g+b)/3
im = Image.open("test.png")
im.load()
pixels = list(im.getdata())
width, height = im.size
for x in range(width):
for y in range(height):
greyscale[x][y] = getGray(pixels[x*width+y])
data = []
for x in range(width):
for y in range(height):
pix = greyscale[x][y]
data.append(pix)
img = Image.new("L", (width,height), "white")
img.putdata(data)
img.save('out.png')
After this, I'm getting this image , which is ok. So now, I want to make fft on my image before I'll save it to new one, so I'm doing like this
scale = 1/(width*height)
greyscale = np.fft.fft2(greyscale)
greyscale = abs(greyscale * scale)
after loading it. After saving it to file, I have . So lets try now open test.png with gimp and use FFT filter plugin. I'm getting this image, which is correct
How I can handle it?

Great question. I’ve never heard of it but the Gimp Fourier plugin seems really neat:
A simple plug-in to do fourier transform on you image. The major advantage of this plugin is to be able to work with the transformed image inside GIMP. You can so draw or apply filters in fourier space, and get the modified image with an inverse FFT.
This idea—of doing Gimp-style manipulation on frequency-domain data and transforming back to an image—is very cool! Despite years of working with FFTs, I’ve never thought about doing this. Instead of messing with Gimp plugins and C executables and ugliness, let’s do this in Python!
Caveat. I experimented with a number of ways to do this, attempting to get something close to the output Gimp Fourier image (gray with moiré pattern) from the original input image, but I simply couldn’t. The Gimp image appears to be somewhat symmetric around the middle of the image, but it’s not flipped vertically or horizontally, nor is it transpose-symmetric. I’d expect the plugin to be using a real 2D FFT to transform an H×W image into a H×W array of real-valued data in the frequency domain, in which case there would be no symmetry (it’s just the to-complex FFT that’s conjugate-symmetric for real-valued inputs like images). So I gave up trying to reverse-engineer what the Gimp plugin is doing and looked at how I’d do this from scratch.
The code. Very simple: read an image, apply scipy.fftpack.rfft in the leading two dimensions to get the “frequency-image”, rescale to 0–255, and save.
Note how this is different from the other answers! No grayscaling—the 2D real-to-real FFT happens independently on all three channels. No abs needed: the frequency-domain image can legitimately have negative values, and if you make them positive, you can’t recover your original image. (Also a nice feature: no compromises on image size. The size of the array remains the same before and after the FFT, whether the width/height is even or odd.)
from PIL import Image
import numpy as np
import scipy.fftpack as fp
## Functions to go from image to frequency-image and back
im2freq = lambda data: fp.rfft(fp.rfft(data, axis=0),
axis=1)
freq2im = lambda f: fp.irfft(fp.irfft(f, axis=1),
axis=0)
## Read in data file and transform
data = np.array(Image.open('test.png'))
freq = im2freq(data)
back = freq2im(freq)
# Make sure the forward and backward transforms work!
assert(np.allclose(data, back))
## Helper functions to rescale a frequency-image to [0, 255] and save
remmax = lambda x: x/x.max()
remmin = lambda x: x - np.amin(x, axis=(0,1), keepdims=True)
touint8 = lambda x: (remmax(remmin(x))*(256-1e-4)).astype(int)
def arr2im(data, fname):
out = Image.new('RGB', data.shape[1::-1])
out.putdata(map(tuple, data.reshape(-1, 3)))
out.save(fname)
arr2im(touint8(freq), 'freq.png')
(Aside: FFT-lover geek note. Look at the documentation for rfft for details, but I used Scipy’s FFTPACK module because its rfft interleaves real and imaginary components of a single pixel as two adjacent real values, guaranteeing that the output for any-sized 2D image (even vs odd, width vs height) will be preserved. This is in contrast to Numpy’s numpy.fft.rfft2 which, because it returns complex data of size width/2+1 by height/2+1, forces you to deal with one extra row/column and deal with deinterleaving complex-to-real yourself. Who needs that hassle for this application.)
Results. Given input named test.png:
this snippet produces the following output (global min/max have been rescaled and quantized to 0-255):
And upscaled:
In this frequency-image, the DC (0 Hz frequency) component is in the top-left, and frequencies move higher as you go right and down.
Now, let’s see what happens when you manipulate this image in a couple of ways. Instead of this test image, let’s use a cat photo.
I made a few mask images in Gimp that I then load into Python and multiply the frequency-image with to see what effect the mask has on the image.
Here’s the code:
# Make frequency-image of cat photo
freq = im2freq(np.array(Image.open('cat.jpg')))
# Load three frequency-domain masks (DSP "filters")
bpfMask = np.array(Image.open('cat-mask-bpfcorner.png')).astype(float) / 255
hpfMask = np.array(Image.open('cat-mask-hpfcorner.png')).astype(float) / 255
lpfMask = np.array(Image.open('cat-mask-corner.png')).astype(float) / 255
# Apply each filter and save the output
arr2im(touint8(freq2im(freq * bpfMask)), 'cat-bpf.png')
arr2im(touint8(freq2im(freq * hpfMask)), 'cat-hpf.png')
arr2im(touint8(freq2im(freq * lpfMask)), 'cat-lpf.png')
Here’s a low-pass filter mask on the left, and on the right, the result—click to see the full-res image:
In the mask, black = 0.0, white = 1.0. So the lowest frequencies are kept here (white), while the high ones are blocked (black). This blurs the image by attenuating high frequencies. Low-pass filters are used all over the place, including when decimating (“downsampling”) an image (though they will be shaped much more carefully than me drawing in Gimp 😜).
Here’s a band-pass filter, where the lowest frequencies (see that bit of white in the top-left corner?) and high frequencies are kept, but the middling-frequencies are blocked. Quite bizarre!
Here’s a high-pass filter, where the top-left corner that was left white in the above mask is blacked out:
This is how edge-detection works.
Postscript. Someone, make a webapp using this technique that lets you draw masks and apply them to an image real-time!!!

There are several issues here.
1) Manual conversion to grayscale isn't good. Use Image.open("test.png").convert('L')
2) Most likely there is an issue with types. You shouldn't pass np.ndarray from fft2 to a PIL image without being sure their types are compatible. abs(np.fft.fft2(something)) will return you an array of type np.float32 or something like this, whereas PIL image is going to receive something like an array of type np.uint8.
3) Scaling suggested in the comments looks wrong. You actually need your values to fit into 0..255 range.
Here's my code that addresses these 3 points:
import numpy as np
from PIL import Image
def fft(channel):
fft = np.fft.fft2(channel)
fft *= 255.0 / fft.max() # proper scaling into 0..255 range
return np.absolute(fft)
input_image = Image.open("test.png")
channels = input_image.split() # splits an image into R, G, B channels
result_array = np.zeros_like(input_image) # make sure data types,
# sizes and numbers of channels of input and output numpy arrays are the save
if len(channels) > 1: # grayscale images have only one channel
for i, channel in enumerate(channels):
result_array[..., i] = fft(channel)
else:
result_array[...] = fft(channels[0])
result_image = Image.fromarray(result_array)
result_image.save('out.png')
I must admit I haven't managed to get results identical to the GIMP FFT plugin. As far as I see it does some post-processing. My results are all kinda very low contrast mess, and GIMP seems to overcome this by tuning contrast and scaling down non-informative channels (in your case all chanels except Red are just empty). Refer to the image:

Remove spurious small islands of noise in an image - Python OpenCV

I am trying to get rid of background noise from some of my images. This is the unfiltered image.
To filter, I used this code to generate a mask of what should remain in the image:
element = cv2.getStructuringElement(cv2.MORPH_RECT, (2,2))
mask = cv2.erode(mask, element, iterations = 1)
mask = cv2.dilate(mask, element, iterations = 1)
mask = cv2.erode(mask, element)
With this code and when I mask out the unwanted pixels from the original image, what I get is:
As you can see, all the tiny dots in the middle area are gone, but a lot of those coming from the denser area are also gone. To reduce the filtering, I tried changing the second parameter of getStructuringElement() to be (1,1) but doing this gives me the first image as if nothing has been filtered.
Is there any way where I can apply some filter that is between these 2 extremes?
In addition, can anyone explain to me what exactly does getStructuringElement() do? What is a "structuring element"? What does it do and how does its size (the second parameter) affect the level of filtering?

A lot of your questions stem from the fact that you're not sure how morphological image processing works, but we can put your doubts to rest. You can interpret the structuring element as the "base shape" to compare to. 1 in the structuring element corresponds to a pixel that you want to look at in this shape and 0 is one you want to ignore. There are different shapes, such as rectangular (as you have figured out with MORPH_RECT), ellipse, circular, etc.
As such, cv2.getStructuringElement returns a structuring element for you. The first parameter specifies the type you want and the second parameter specifies the size you want. In your case, you want a 2 x 2 "rectangle"... which is really a square, but that's fine.
In a more bastardized sense, you use the structuring element and scan from left to right and top to bottom of your image and you grab pixel neighbourhoods. Each pixel neighbourhood has its centre exactly at the pixel of interest that you're looking at. The size of each pixel neighbourhood is the same size as the structuring element.
Erosion
For an erosion, you examine all of the pixels in a pixel neighbourhood that are touching the structuring element. If every non-zero pixel is touching a structuring element pixel that is 1, then the output pixel in the corresponding centre position with respect to the input is 1. If there is at least one non-zero pixel that does not touch a structuring pixel that is 1, then the output is 0.
In terms of the rectangular structuring element, you need to make sure that every pixel in the structuring element is touching a non-zero pixel in your image for a pixel neighbourhood. If it isn't, then the output is 0, else 1. This effectively eliminates small spurious areas of noise and also decreases the area of objects slightly.
The size factors in where the larger the rectangle, the more shrinking is performed. The size of the structuring element is a baseline where any objects that are smaller than this rectangular structuring element, you can consider them as being filtered and not appearing in the output. Basically, choosing a 1 x 1 rectangular structuring element is the same as the input image itself because that structuring element fits all pixels inside it as the pixel is the smallest representation of information possible in an image.
Dilation
Dilation is the opposite of erosion. If there is at least one non-zero pixel that touches a pixel in the structuring element that is 1, then the output is 1, else the output is 0. You can think of this as slightly enlarging object areas and making small islands bigger.
The implications with size here is that the larger the structuring element, the larger the areas of the objects will be and the larger the isolated islands become.
What you're doing is an erosion first followed by a dilation. This is what is known as an opening operation. The purpose of this operation is to remove small islands of noise while (trying to) maintain the areas of the larger objects in your image. The erosion removes those islands while the dilation grows back the larger objects to their original sizes.
You follow this with an erosion again for some reason, which I can't quite understand, but that's ok.
What I would personally do is perform a closing operation first which is a dilation followed by an erosion. Closing helps group areas that are close together into a single object. As such, you see that there are some larger areas that are close to each other that should probably be joined before we do anything else. As such, I would do a closing first, then do an opening after so that we can remove the isolated noisy areas. Take note that I'm going to make the closing structuring element size larger as I want to make sure I get nearby pixels and the opening structuring element size smaller so that I don't want to mistakenly remove any of the larger areas.
Once you do this, I would mask out any extra information with the original image so that you leave the larger areas intact while the small islands go away.
Instead of chaining an erosion followed by a dilation, or a dilation followed by an erosion, use cv2.morphologyEx, where you can specify MORPH_OPEN and MORPH_CLOSE as the flags.
As such, I would personally do this, assuming your image is called spots.png:
import cv2
import numpy as np
img = cv2.imread('spots.png')
img_bw = 255*(cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) > 5).astype('uint8')
se1 = cv2.getStructuringElement(cv2.MORPH_RECT, (5,5))
se2 = cv2.getStructuringElement(cv2.MORPH_RECT, (2,2))
mask = cv2.morphologyEx(img_bw, cv2.MORPH_CLOSE, se1)
mask = cv2.morphologyEx(mask, cv2.MORPH_OPEN, se2)
mask = np.dstack([mask, mask, mask]) / 255
out = img * mask
cv2.imshow('Output', out)
cv2.waitKey(0)
cv2.destroyAllWindows()
cv2.imwrite('output.png', out)
The above code is pretty self-explanatory. First, I read in the image and then I convert the image to grayscale and threshold with an intensity of 5 to create a mask of what is considered object pixels. This is a rather clean image and so anything larger than 5 seems to have worked. For the morphology routines, I need to convert the image to uint8 and scale the mask to 255. Next, we create two structuring elements - one that is a 5 x 5 rectangle for the closing operation and another that is 2 x 2 for the opening operation. I run cv2.morphologyEx twice for the opening and closing operations respectively on the thresholded image.
Once I do that, I stack the mask so that it becomes a 3D matrix and divide by 255 so that it becomes a mask of [0,1] and then we multiply this mask with the original image so that we can grab the original pixels of the image back and maintaining what is considered a true object from the mask output.
The rest is just for illustration. I show the image in a window, and I also save the image to a file called output.png, and its purpose is to show you what the image looks like in this post.
I get this:
Bear in mind that it isn't perfect, but it's much better than how you had it before. You'll have to play around with the structuring element sizes to get something that you consider as a good output, but this is certainly enough to get you started.
C++ version
There have been some requests to translate the code I wrote above into the C++ version using OpenCV. I have finally gotten around to writing a C++ version of the code and this has been tested on OpenCV 3.1.0. The code for this is below. As you can see, the code is very similar to that seen in the Python version. However, I used cv::Mat::setTo on a copy of the original image and set whatever was not part of the final mask to 0. This is the same thing as performing an element-wise multiplication in Python.
#include <opencv2/opencv.hpp>
using namespace cv;
int main(int argc, char *argv[])
{
// Read in the image
Mat img = imread("spots.png", CV_LOAD_IMAGE_COLOR);
// Convert to black and white
Mat img_bw;
cvtColor(img, img_bw, COLOR_BGR2GRAY);
img_bw = img_bw > 5;
// Define the structuring elements
Mat se1 = getStructuringElement(MORPH_RECT, Size(5, 5));
Mat se2 = getStructuringElement(MORPH_RECT, Size(2, 2));
// Perform closing then opening
Mat mask;
morphologyEx(img_bw, mask, MORPH_CLOSE, se1);
morphologyEx(mask, mask, MORPH_OPEN, se2);
// Filter the output
Mat out = img.clone();
out.setTo(Scalar(0), mask == 0);
// Show image and save
namedWindow("Output", WINDOW_NORMAL);
imshow("Output", out);
waitKey(0);
destroyWindow("Output");
imwrite("output.png", out);
}
The results should be the same as what you get in the Python version.

One can also remove small pixel clusters using the remove_small_objects function in skimage:
import matplotlib.pyplot as plt
from skimage import morphology
import numpy as np
import skimage
# read the image, grayscale it, binarize it, then remove small pixel clusters
im = plt.imread('spots.png')
grayscale = skimage.color.rgb2gray(im)
binarized = np.where(grayscale>0.1, 1, 0)
processed = morphology.remove_small_objects(binarized.astype(bool), min_size=2, connectivity=2).astype(int)
# black out pixels
mask_x, mask_y = np.where(processed == 0)
im[mask_x, mask_y, :3] = 0
# plot the result
plt.figure(figsize=(10,10))
plt.imshow(im)
This displays:
To retain only larger clusters, try increasing min_size (smallest size of retained clusters) and decreasing connectivity (size of pixel neighborhood when forming clusters). Using just those two parameters, one can retain only pixel clusters of an appropriate size.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.