Using python to select an area of an Image

Using python to select an area of an Image - python

I am trying to select an area of an image to do some analysis on that specific area of the image.
However, when I searched online, I am only able to find guides on how to select a rectangular area. I need to select an area that is drawn using my mouse. An example of such area is included bellow.
Would anyone be able to recommend to me some key words or libraries to search to help me with this or links to guides that do something similar?
Also, I am not sure if it is necessary information but the analysis I am trying to do on the region of interest is to Find a ratio of amount of white to black pixels in that specific area.

I produced a simple working example based on this answer. I also tried using scipy.ndimage.morphology.fill_binary_holes, but could not get it to work. Note that the provided function takes a little longer since it is assuming that the input image is grayscale and not binarized.
I specifically avoided the usage of OpenCV, since I find the setup to be a bit tedious, but I think it should also provide an equivalent (see here).
Additionally, my "binarization" is kind of hacky, but you can probably figure out how to parse your image into a valid format yourself (and it might be easier if you produce the result within a program). In any case, I would suggest making sure that you have a proper image format, since jpeg's compression might violate your connectivity, and cause issues in certain cases.
import scipy as sp
import numpy as np
import scipy.ndimage
import matplotlib.pyplot as plt
def flood_fill(test_array,h_max=255):
input_array = np.copy(test_array)
el = sp.ndimage.generate_binary_structure(2,2).astype(np.int)
inside_mask = sp.ndimage.binary_erosion(~np.isnan(input_array), structure=el)
output_array = np.copy(input_array)
output_array[inside_mask]=h_max
output_old_array = np.copy(input_array)
output_old_array.fill(0)
el = sp.ndimage.generate_binary_structure(2,1).astype(np.int)
while not np.array_equal(output_old_array, output_array):
output_old_array = np.copy(output_array)
output_array = np.maximum(input_array,sp.ndimage.grey_erosion(output_array, size=(3,3), footprint=el))
return output_array
x = plt.imread("test.jpg")
# "convert" to grayscale and invert
binary = 255-x[:,:,0]
filled = flood_fill(binary)
plt.imshow(filled)
This produces the following result:

Related

Is there a way to extract pixel color information from an image with OpenImageIO's python bindings

I want to ask the more experienced people how to get the RGB values of a pixel in an image using oiio's python bindings.
I have only just begun using oiio and am unfamiliar with the library as well as image manipulation on code level.
I poked around the documentation and I don't quite understand how the parameters work. It doesn't seem to work the same as python and I'm having a hard time trying to figure out, as I don't know C.
A) What command to even use to get pixel information (seems like get_pixel could work)
and
B) How to get it to work. I'm not understanding the parameters requirements exactly.
Edit:
I'm trying to convert the C example of visiting all pixels to get an average color in the documentation into something pythonic, but am getting nowhere.
Would appreciate any help, thank you.
Edit: adding the code
buffer = oiio.ImageBuf('image.tif')
array = buffer.read(0, 0, True)
print buffer.get_pixels(array)
the error message I get is:
# Error: Python argument types in
# ImageBuf.get_pixels(ImageBuf, bool)
# did not match C++ signature:
# get_pixels(class OpenImageIO::v1_5::ImageBuf, enum OpenImageIO::v1_5::TypeDesc::BASETYPE)
# get_pixels(class OpenImageIO::v1_5::ImageBuf, enum OpenImageIO::v1_5::TypeDesc::BASETYPE, struct OpenImageIO::v1_5::ROI)
# get_pixels(class OpenImageIO::v1_5::ImageBuf, struct OpenImageIO::v1_5::TypeDesc)
# get_pixels(class OpenImageIO::v1_5::ImageBuf, struct OpenImageIO::v1_5::TypeDesc, struct OpenImageIO::v1_5::ROI)

OpenImageIO has several classes for dealing with images, with different levels of abstraction. Assuming that you are interested in the ImageBuf class, I think the simplest way to access individual pixels from Python (with OpenImageIO 2.x) would look like this:
import OpenImageIO as oiio
buf = ImageBuf ("foo.jpg")
p = buf.getpixel (50, 50) # x, y
print (p)
p will be a numpy array, so the this will produce output like
(0.01148223876953125, 0.0030574798583984375, 0.0180511474609375)

Discover transformation required to align images of standardized documents

My question is not too far off from the "Image Alignment (ECC) in OpenCV ( C++ / Python )" article.
I also found the following article about facial alignment to be very interesting, but WAY more complex than my problem.
Wow! I can really go down the rabbit-hole.
My question is WAY more simple.
I have a scanned document that I have treated as a "template". In this template I have manually mapped the pixel regions that I require info from as:
area = (x1,y1,x2,y2)
such that x1<x2, y1<y2.
Now, these regions are, as is likely obvious, a bit too specific to my "template".
All other files that I want to extract data from are mostly shifted by some unknown amount such that their true area for my desired data is:
area = (x1 + ε1, y1 + ε2, x2 + ε1, y2 + ε2)
Where ε1, ε2 are unknown in advance.
But the documents are otherwise HIGHLY similar outside of this shift.
I want to discover, ideally through opencv, what translation is required (for the time being ignoring euclidean) to "align" these images as to disover my ε, shift my area, and parse my data directly.
I have thought about using tesseract to mine the text from the document and then parse from there, but there are check boxes that are either filled or empty
that contain meaningful information for my problem.
The code I currently have for cropping the image is:
from PIL import Image
img = Image.open(img_path)
area = area_lookup['key']
cropped_img = img.crop(area)
cropped_img.show()
My two sample files are attached.
My two images are:
We can assume my first image is my "template".
As you can see, the two images are very "similar" but one is moved slightly (human error). There may be cases where the rotation is more extreme, or the image is shifted more.
I would like transform image 2 to be as aligned to image 1 as possible, and then parse data from it.
Any help would be sincerely appreciated.
Thank you very much

Image warping with scikit-image and transform.PolynomialTransform

I attach a zip archive with all the files needed to illustrate and reproduce the problem.
(I don't have permissions to upload images yet...)
I have an image (test2.png in the zip archive ) with curved lines.
I try to warp it so the lines are straight.
I thought of using scikit-image transform, and in particular transform.PolynomialTransform because the transformation involves high order distortions.
So first I measure the precise position of each line at regular intervals in x to define the input interest points (in the file source_test2.csv).
Then I compute the corresponding desired positions, located along a straight line (in the file destination_test2.csv).
The figure correspondence.png shows how it looks like.
Next, I simply call transform.PolynomialTransform() using a polynomial of order 3.
It finds a solution, but when I apply it using transform.warp(), the result is crazy, as illustrated in the file Crazy_Warped.png
Anybody can tell what I am doing wrong?
I tried polynomial of order 2 without luck...
I managed to get a good transformation for a sub-image (the first 400 columns only).
Is transform.PolynomialTransform() completely unstable in a case like mine?
Here is the entire code:
import numpy as np
import matplotlib.pyplot as plt
import asciitable
import matplotlib.pylab as pylab
from skimage import io, transform
# read image
orig=io.imread("test2.png",as_grey=True)
# read tables with reference points and their desired transformed positions
source=asciitable.read("source_test2.csv")
destination=asciitable.read("destination_test2.csv")
# format as numpy.arrays as required by scikit-image
# (need to add 1 because I started to count positions from 0...)
source=np.column_stack((source["x"]+1,source["y"]+1))
destination=np.column_stack((destination["x"]+1,destination["y"]+1))
# Plot
plt.imshow(orig, cmap='gray', interpolation='nearest')
plt.plot(source[:,0],source[:,1],'+r')
plt.plot(destination[:,0],destination[:,1],'+b')
plt.xlim(0,orig.shape[1])
plt.ylim(0,orig.shape[0])
# Compute the transformation
t = transform.PolynomialTransform()
t.estimate(destination,source,3)
# Warping the image
img_warped = transform.warp(orig, t, order=2, mode='constant',cval=float('nan'))
# Show the result
plt.imshow(img_warped, cmap='gray', interpolation='nearest')
plt.plot(source[:,0],source[:,1],'+r')
plt.plot(destination[:,0],destination[:,1],'+b')
plt.xlim(0,img_warped.shape[1])
plt.ylim(0,img_warped.shape[0])
# Save as a file
io.imsave("warped.png",img_warped)
Thanks in advance!

There are a couple of things wrong here, mainly they have to do with coordinate conventions. For example, if we examine the code where you plot the original image, and then put the clicked point on top of it:
plt.imshow(orig, cmap='gray', interpolation='nearest')
plt.plot(source[:,0],source[:,1],'+r')
plt.xlim(0,orig.shape[1])
plt.ylim(0,orig.shape[0])
(I've taken out the destination points to make it cleaner) then we get the following image:
As you can see, the y-axis is flipped, if we invert the y-axis with:
source[:,1] = orig.shape[0] - source[:,1]
before plotting, then we get the following:
So that is the first problem (don't forget to invert the destination points as well), the second has to do with the transform itself:
t.estimate(destination,source,3)
From the documentation we see that the call takes the source points first, then the destination points. So the order of those arguments should be flipped.
Lastly, the clicked points are of the form (x,y), but the image is stored as (y,x), so we have to transpose the image before applying the transform and then transpose back again:
img_warped = transform.warp(orig.transpose(), t, order=2, mode='constant',cval=float('nan'))
img_warped = img_warped.transpose()
When you make these changes, you get the following warped image:
These lines aren't perfectly flat but it makes much more sense.

Thank you very much for the detailed answer! I cannot believe I did not see the axis inversion problem... Thanks for catching it!
But I am afraid your final solution does not solve my problem... The image you get is still crazy. It should be continuous, no have such big holes and weird distortions... (see final solution below)
I found I could get a reasonable solution using RANSAC:
from skimage.measure import ransac
t, inliers = ransac((destination,source), transform.PolynomialTransform, min_samples=20,residual_threshold=1.0, max_trials=1000)
outliers = inliers == False
I then get the following result
Note that I think I was right using (destination,source) in that order! I think it has to do with the fact that transform.warp requires the inverse_map as input for the transformation object, not the forward map. But maybe I am wrong? The good result I am getting suggest it's correct.
I guess that Polynomial transforms are too unstable, and using RANSAC allows to get a reasonable solution.
My problem was then to find a way to change the polynomial order in the RANSAC call...
transform.PolynomialTransform() does not take any parameters, and uses by default a 2nd order polynomial, but from the result I can see I would need a 3rd or 4th order polynomial.
So I opened a new question, and got a solution from Stefan van der Walt. Follow the link to see how to do it.
Thanks again for your help!

Not able to display/Convert Image

I am new to Python and Opencv.
I am using the following code.
import Image
import ImageChops
im1 = Image.open("img1.png")
im2 = Image.open("img2.png")
diff = ImageChops.difference(im2, im1)
When I do cv.ShowImage, it asks me to convert it. I am trying all kinds of convert but there is always an error.
The only way I can see the image is by doing the following.
diff.save("final","JPEG")
Is there there another way I can convert to an IplImage or CvMat?

cv.SaveImage(diff, cv.LoadImage(diff)) might work, using the opencv function.
EDIT: In sight of the comment below, I think trying
cv.SaveImage(diff, cv.LoadImage(diff))
cv.ShowImage('box name', diff)
might work.

The difference image contains negative pixel values, so I don't think cv.ShowImage can display it 'as is'.
The range of possible pixel values after subtraction is -255 to 255. You might want to normalize pixel values first, by
new_value = (old_value + 255)/2
I don't use OpenCV on Python, so I cannot post code for the above.

Categorize different images

I have a number of images from Chinese genealogies, and I would like to be able to programatically categorize them. Generally speaking, one type of image has primarily line-by-line text, while the other type may be in a grid or chart format.
Example photos
'Desired' type: http://www.flickr.com/photos/63588871#N05/8138563082/
'Other' type: http://www.flickr.com/photos/63588871#N05/8138561342/in/photostream/
Question: Is there a (relatively) simple way to do this? I have experience with Python, but little knowledge of image processing. Direction to other resources is appreciated as well.
Thanks!

Assuming that at least some of the grid lines are exactly or almost exactly vertical, a fairly simple approach might work.
I used PIL to find all the columns in the image where more than half of the pixels were darker than some threshold value.
Code
import Image, ImageDraw # PIL modules
withlines = Image.open('withgrid.jpg')
nolines = Image.open('nogrid.jpg')
def findlines(image):
w,h, = image.size
s = w*h
im = image.point(lambda i: 255 * (i < 60)) # threshold
d = im.getdata() # faster than per-pixel operations
linecolumns = []
for col in range(w):
black = sum( (d[x] for x in range(col, s, w)) )//255
if black > 450:
linecolumns += [col]
# return an image showing the detected lines
im2 = image.convert('RGB')
draw = ImageDraw.Draw(im2)
for col in linecolumns:
draw.line( (col,0,col,h-1), fill='#f00', width = 1)
return im2
findlines(withlines).show()
findlines(nolines).show()
Results
showing detected vertical lines in red for illustration
As you can see, four of the grid lines are detected, and, with some processing to ignore the left and right sides and the center of the book, there should be no false positives on the desired type.
This means that you could use the above code to detect black columns, discard those that are near to the edge or the center. If any black columns remain, classify it as the "other" undesired class of pictures.

AFAIK, there is no easy way to solve this. You will need a decent amount of image processing and some basic machine learning to classify these kinds of images (and even than it probably won't be 100% successful)
Another note:
While this can be solved by only using machine learning techniques, I would advice you to start searching for some image processing techniques first and try to convert your image to a form that has a decent difference for both images. For this you best start reading about the fft. After that have a look at some digital image processing techniques. When you feel comfortable that you have a decent understanding of these, you can read up on pattern recognition.
This is only one suggested approach though, there are more ways to achieve this.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.