How to find region bounding box of an object in image [closed]

How to find region bounding box of an object in image [closed] - python

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I have the image and its mask like this:
How can I use the mask to identify the rectangle bounding box around the object? So the final result should be this (with the background removed):
import urllib
from io import BytesIO
url_mask = "https://i.stack.imgur.com/rIyJ6.png"
f = urllib.request.urlopen(url_mask)
mask = Image.open(BytesIO(f.read()))
url_im = "https://i.stack.imgur.com/msm7L.jpg"
f2 = urllib.request.urlopen(url_im)
img = Image.open(BytesIO(f2.read()))

There's a fast way to do this (cv2.boundingRect()), but here's a way to think about how to do it from scratch.
Let's call your image im and the mask im, which I assume are NumPy arrays (or similar). Your goal is to find row indices [row_low, row_high] and column indices [col_low, col_high] such that the array im[row_low:row_high, col_low:col_high] is the sub-image that you're looking for.
If mask is an array of pixel values (probably 0's (black) and 255's (white)), start by converting it to a two-dimensional boolean array where an entry being True means you have a white pixel at that part of the mask (this isn't strictly necessary but it helps to see what's going on).
>>> mask.shape
(758, 734, 3) # The original mask, with RGB layers.
>>> mask2d = mask.mean(axis=2) # Get a single black-and-white mask.
>>> mask2d.shape
(758, 734)
>>> bmask = (mask2d == 255) # Or maybe (mask >= 200) to be safe.
Now for each row and column, you can use np.max() to determine if that row or column has True in it or not (meaning there is a white pixel in that row or column of the mask). You can do this for all of the columns at once by specifying axis in np.max(): axis=0 will check if there's a True in the column, and axis=1 will check if there's a True in the row.
>>> import numpy as np
>>> bmask.shape # Here's the boolean mask.
(758, 734) # It has 758 rows and 734 columns.
>>> rows_with_white = np.max(bmask, axis=1)
>>> cols_with_white = np.max(bmask, axis=0)
# Check shapes.
>>> rows_with_white.shape
(758,)
>>> cols_with_white.shape
(734,)
The location of the first and last True in rows_with_white give you row_low and row_high, respectively, and similarly for cols_with_white. We can get them both with np.argmax(), which finds the first location of the largest value (which, for boolean arrays, is True). To get the location of the last True, we can simply reverse the array and repeat the process. These are negative indices, which indicate that we're counting backward from the end.
>>> row_low = np.argmax(rows_with_white)
>>> row_high = -np.argmax(rows_with_white[::-1])
>>> col_low = np.argmax(cols_with_white)
>>> col_high = -np.argmax(cols_with_white[::-1])
>>> print((row_low, row_high), (col_low, col_high))
(85, -85) (174, -164)
Now that you have the indices, you can simply slice the original image to get the cropped one.
>>> im_cropped = im[row_low:row_high, col_low:col_high]
And here's the whole thing put together, which assumes you already have mask and im defined.
>>> import numpy as np
>>> bmask = (mask.mean(axis=2) == 255)
>>> rows_with_white = np.max(bmask, axis=1)
>>> cols_with_white = np.max(bmask, axis=0)
>>> row_low = np.argmax(rows_with_white)
>>> row_high = -np.argmax(rows_with_white[::-1])
>>> col_low = np.argmax(cols_with_white)
>>> col_high = -np.argmax(cols_with_white[::-1])
>>> im_cropped = im[row_low:row_high, col_low:col_high]

If you are not using cv2, then you could look trough all pixels and find xmin, xmax, ymin ymax that equal 1. Since black is usually represented as 0 and white as 1.

Related

Convert large array of floats to colors and write to binary efficiently

I am working with large NetCDF4 files (about 1 GB and up but less than my 8 GB memory for now). 99% of the time the data type will be a float32. I want to map these values to an array of RGB colors which I will then write to a binary file to be read by another application for viewing. Because I only need 1 byte for each R, G, and B, I want to have an array of np.uint8 to represent this. In the end the array will take up 25% less space than the floats. However, as the original data is big, I don't want to keep both the original data and the color data in memory at the same time. For now I provide a color for the low value and the color for the high value. The problem is that in my program for a short period time, the color data consists of floats instead of np.uint8, which leads to taking up 3 times as much memory as the original data. Is there a way to skip the float conversion or at least only have one float in memory so that I don't take up this much memory? I have provided relevant code below:
from netCDF4 import Dataset
import numpy as np
import dask.array as da
import gc
import time
import sys
# Read file path
file_path = sys.argv[1]
# Default colors is blue for low and red for high
lowColor = np.array([0, 0, 255], dtype=int)
highColor = np.array([255, 0, 0], dtype=int)
data = Dataset(file_path)
allVariables = data.variables
# Sometimes we have time_bnds, lat_bnds, etc.
# Keep anything that doesn't have 'bnds'
varNames = list(filter(lambda x: 'bnds' not in x, list(allVariables.keys())))
# Remove the dimensions
varNames = list(filter(lambda x: x not in data.dimensions, varNames))
var = varNames[0]
flattened = allVariables[var][:].flatten()
origShape = allVariables[var].shape
if isinstance(flattened, np.ma.core.MaskedArray):
flattened = flattened.filled(np.nan)
# Find the minimum value and the range of values.
# Using these two we can make a percentage of how
# far 'up' each value and simply convert colors
# based on that. Because there's a chance of the data
# having NaNs, I can't use ptp().
lowVal = np.nanmin(flattened)
ptp = np.nanmax(flattened) - lowVal
# Subtract the min from each value and divide by ptp
# and add a dimension for dot product later.
percents = ((flattened - lowVal) / ptp)[np.newaxis, :]
# Remove flattened from memory as it is not needed anymore
flattened = None
gc.collect()
# Calculate the color difference
diff = (highColor - lowColor)[np.newaxis, :].T
# Do the dot product to create a list of colors
# Transpose so each color is each row. Also
# add the low color
colors = lowColor + np.dot(diff, percents).T # All floats here
# Round each value and cast to uint8 and finally reshape to
# the original data
colors = np.round(colors).astype(np.uint8)
colors = colors.reshape(origShape + (3,))
colors.tofile('colors_' + allVariables[var].name + '.bin')

Mode on a specific axis in numpy array

Goal
Given a list of images, I'd like to create a new image where each pixel contains the values (R,G,B) that occurred most frequently in the input list at that location.
Details
Input: A list L that has length >=2. Each image/object in the list is a float32 numpy array with dimensions (288, 512, 3) where 3 represents the R/G/B color channels.
Output: A numpy array with the same shape (288,512,3). If there is no pixel that occurred most frequently, any of the pixels for that location can be returned.
Attempt
image = stats.mode(L)[0][0]
The problem with this approach is that it looks at each R/G/B value of a pixel individually. But I want a pixel to only be considered the same as another pixel if all the color channels match (i.e. R1=R2, G1=G2, B1=B2).

Try this:
def packRGB(RGB):
return np.left_shift(RGB, [0, 8, 16]).sum(-1)
def unpackRGB(i24):
B = np.right_shift(i24, 16)
G = np.right_shift(i24, 8) - np.left_shift(B, 8)
R = i24 - np.left_shift(G, 8) - np.left_shift(B, 16)
return np.stack([R, G, B]).T
def img_mode(imgs_list, average_singles = True):
imgs = np.array(imgs_list) #(10, 100, 100, 3)
imgs24 = packRGB(imgs) # (10, 100, 100)
mode, count = scipy.stats.mode(imgs24, axis = 0) # (1, 100,100)
mode, count = mode.squeeze(), count.squeeze() #(100, 100)
if average_singles:
out = np.empty(imgs.shape[1:])
out[count == 1] = np.rint(np.average(imgs[:, count == 1], axis = 0))
out[count > 1] = unpackRGB(mode[count > 1])
else:
out = unpackRGB(mode)
return out
EDIT: fixed error and added option from your other question: Aany value in set if no mode, which should be faster due to no division or rounding. scipy.stats.mode returns lowest value, which in this case will be the pixel with the lowest blue value. You also might want to try median, as mode is going to be unstable to very small differences in the inputs (especially if there are only ten)
This will also be a lot slower than, for instance, Photoshop's statistics function (I assume you're trying to do something like this), as you'd want to parallel-ize the function as well to make it time efficient.

Count the number of pixels by color from an image loaded into a numpy array

I'm trying to count the number pixels in a weather radar image for each dbz reflectivity level (the colored blocks of green, orange, yellow, red, etc.) so I can "score" the radar image based on the type of echos.
I'm new to numpy and numpy arrays, but I know it can be very efficient when I'm working with the individual pixels in an image, so I'd like to learn more.
I'm not even sure I'm selecting the pixels correctly, but I think I'm getting close.
I have a sample of using both numpy and basic pixel iteration to count the number of green pixels with an RGBA of (1, 197, 1, 255).
Hopefully I'm close and someone can give me guidance on how to select the pixels using numpy and then count them:
import io
import numpy as np
import PIL.Image
import urllib2
import sys
color_dbz_20 = (2, 253, 2, 255)
color_dbz_25 = (1, 197, 1, 255)
color_dbz_30 = (0, 142, 0, 255)
url = 'http://radar.weather.gov/ridge/RadarImg/N0R/DLH_N0R_0.gif'
image_bytes = io.BytesIO(urllib2.urlopen(url).read())
image = PIL.Image.open(image_bytes)
image = image.convert('RGBA')
total_pixels = image.height * image.width
# Count using numpy
np_pixdata = np.array(image)
# Didn't work, gave me the total size:
# np_counter = np_pixdata[(np_pixdata == color_dbz_20)].size
np_counter = np.count_nonzero(np_pixdata[(np_pixdata == color_dbz_20)])
# Count using pillow
pil_pixdata = image.load()
pil_counter = 0
for y in xrange(image.size[1]):
for x in xrange(image.size[0]):
if pil_pixdata[x, y] == color_dbz_20:
pil_counter += 1
print "Numpy Count: %d" % np_counter
print "Pillow Count: %d" % pil_counter
Output:
Numpy Count: 134573
Pillow Count: 9967

The problem is that the numpy array will be an array of size X * Y * 4 but you compare each element with a tuple - but it's only a number. That's the reason why your:
np_counter = np_pixdata[(np_pixdata == color_dbz_20)].size
didn't exclude any elements.
That you got different counts in the end is because you counted nonzero-elements. But there are zeros in some array elements, just for one color but nevertheless 0 - which are excluded even though you don't want that!
First you want to compare numpy arrays so better convert the color-tuples too:
color_dbz_20 = np.array([2, 253, 2, 255]), ...
To get the real result for your condition you must use np.all along axis=2:
np.all(np_pixdata == color_dbz_20, axis=2)
This checks if the values along axis 2 (colors) are equal to the ones in your color_dbz_20 and this for each pixel. To get the sum of all the matches:
np.sum(np.all(np_pixdata == color_dbz_20, axis=2)) # Sum of boolean array is integer!
which gives you the number of pixel where the condition is True. True is interpreted as 1 and False as 0 - that way doing the sum will work - alternativly you could also count_nonzero instead of sum here. Always assuming you created your color_dbz_20-array as np.array.
Maybe the image has a different dimensionality and it's not width * height * depth then you just need to adjust the axis in the np.all to the dimension where the colors are (the one with length 4).

Breaking down numpy code [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have been meticulously reading documentation and rereading/running the below code in order to understand exactly what is occurring. There are still gaps in my knowledge though. I wanted to present the code to you, with comments, which signify the gaps in my knowledge that hopefully some of you are willing to fill.
So here are my request friends:
1) Help me fill in gaps in my knowledge
2) Explain what is going on here step by step in a non-technical and simple format.
import numpy
import scipy.misc
import matplotlib.pyplot
lena = scipy.misc.lena()
''' Generates an artificial range within the framework of the original array (Which is an image)
This artificial range will be paired with another one and used to 'climb'
Through the original array and make changes'''
def get_indices(size):
arr = numpy.arange(size)
#This sets every fourth element to False? How?
return arr % 4 == 0
lena1 = lena.copy()
xindices = get_indices(lena.shape[0])
yindices = get_indices(lena.shape[1])
'''I am unsure of HOW the below code is executing. I know something is being
Set to zero, but what? And how can I verify it?'''
lena[xindices, yindices] = 0
#What does the argument 211 do exactly?
matplotlib.pyplot.subplot(211)
matplotlib.pyplot.imshow(lena1)
matplotlib.pyplot.show()
Thanks mates!

Using the Python debugger is always useful to step through your code while it is executing. Write the following in any place you choose:
import pdb; pdb.set_trace()
Execution will be stopped and you can inspect any variable, use any defined functions, and advance line by line.
Here you have a commented version of your code. The comment on the function is transformed into a docstring with a doctest that could be executed.
import numpy
import scipy.misc
import matplotlib.pyplot
# Get classic image processing example image, Lena, at 8-bit grayscale
# bit-depth, 512 x 512 size.
lena = scipy.misc.lena()
# lena is now a Numpy array of integers, between 245 and 25, of 512 rows and
# 512 columns.
def get_indices(size):
"""
Returns each fourth index in a Numpy vector of the passed in size.
Specifically, return a vector of booleans, where all indices are set to
False except those of every fourth element. This vector can be used to
index another Numpy array and select *only* those elements. Example use:
>>> import numpy as np
>>> vector = np.array([0, 1, 2, 3, 4])
>>> get_indices(vector.size)
array([ True, False, False, False, True], ...)
"""
arr = numpy.arange(size)
return arr % 4 == 0
# Keep a copy of the original image
lena1 = lena.copy()
# Use the defined function to get every fourth index, first in the x direction,
# then in the y direction
xindices = get_indices(lena.shape[0])
yindices = get_indices(lena.shape[1])
# Set every pixel that equals true in the vectors further up to 0. This
# selects **each fourth pixel on the diagonal** (from up left to bottom right).
lena[xindices, yindices] = 0
# Create a Matplotlib plot, with 2 subplots, and selects the one on the 1st
# colum, 1st row. The layout for all subplots is determined from all calls to
# subplot, i.e. if you later call `subplot(212)` you will get a vertical layout
# in one column and two rows; but if you call `subplot(221)` you will get a
# horizontal layout in two columns and one row.
matplotlib.pyplot.subplot(211)
# Show the unaltered image on the first subplot
matplotlib.pyplot.imshow(lena1)
# You could plot the modified original image in the second subplot, and compare
# to the unmodified copy by issuing:
#matplotlib.pyplot.subplot(212)
#matplotlib.pyplot.imshow(lena)
matplotlib.pyplot.show()

filling gaps on an image using numpy and scipy

The image (test.tif) is attached.
The np.nan values are the whitest region.
How to fill those whitest region using some gap filling algorithms that uses values from the neighbours?
import scipy.ndimage
data = ndimage.imread('test.tif')

As others have suggested, scipy.interpolate can be used. However, it requires fairly extensive index manipulation to get this to work.
Complete example:
from pylab import *
import numpy
import scipy.ndimage
import scipy.interpolate
import pdb
data = scipy.ndimage.imread('data.png')
# a boolean array of (width, height) which False where there are missing values and True where there are valid (non-missing) values
mask = ~( (data[:,:,0] == 255) & (data[:,:,1] == 255) & (data[:,:,2] == 255) )
# array of (number of points, 2) containing the x,y coordinates of the valid values only
xx, yy = numpy.meshgrid(numpy.arange(data.shape[1]), numpy.arange(data.shape[0]))
xym = numpy.vstack( (numpy.ravel(xx[mask]), numpy.ravel(yy[mask])) ).T
# the valid values in the first, second, third color channel, as 1D arrays (in the same order as their coordinates in xym)
data0 = numpy.ravel( data[:,:,0][mask] )
data1 = numpy.ravel( data[:,:,1][mask] )
data2 = numpy.ravel( data[:,:,2][mask] )
# three separate interpolators for the separate color channels
interp0 = scipy.interpolate.NearestNDInterpolator( xym, data0 )
interp1 = scipy.interpolate.NearestNDInterpolator( xym, data1 )
interp2 = scipy.interpolate.NearestNDInterpolator( xym, data2 )
# interpolate the whole image, one color channel at a time
result0 = interp0(numpy.ravel(xx), numpy.ravel(yy)).reshape( xx.shape )
result1 = interp1(numpy.ravel(xx), numpy.ravel(yy)).reshape( xx.shape )
result2 = interp2(numpy.ravel(xx), numpy.ravel(yy)).reshape( xx.shape )
# combine them into an output image
result = numpy.dstack( (result0, result1, result2) )
imshow(result)
show()
Output:
This passes to the interpolator all values we have, not just the ones next to the missing values (which may be somewhat inefficient). It also interpolates every point in the output, not just the missing values (which is extremely inefficient). A better way is to interpolate just the missing values, and then patch them into the original image. This is just a quick working example to get started :)

I think viena's question is more related to an inpainting problem.
Here are some ideas:
In order to fill the gaps in B/W images you can use some filling algorithm like scipy.ndimage.morphology.binary_fill_holes. But you have a gray level image, so you can't use it.
I suppose that you don't want to use a complex inpainting algorithm. My first suggestion is: Don't try to use Nearest gray value (you don't know the real value of the NaN pixels). Using the NEarest value will generate a dirty algorithm. Instead, I would suggest you to fill the gaps with some other value (e.g. the mean of the row). You can do it without coding by using scikit-learn:
Source:
>>> from sklearn.preprocessing import Imputer
>>> imp = Imputer(strategy="mean")
>>> a = np.random.random((5,5))
>>> a[(1,4,0,3),(2,4,2,0)] = np.nan
>>> a
array([[ 0.77473361, 0.62987193, nan, 0.11367791, 0.17633671],
[ 0.68555944, 0.54680378, nan, 0.64186838, 0.15563309],
[ 0.37784422, 0.59678177, 0.08103329, 0.60760487, 0.65288022],
[ nan, 0.54097945, 0.30680838, 0.82303869, 0.22784574],
[ 0.21223024, 0.06426663, 0.34254093, 0.22115931, nan]])
>>> a = imp.fit_transform(a)
>>> a
array([[ 0.77473361, 0.62987193, 0.24346087, 0.11367791, 0.17633671],
[ 0.68555944, 0.54680378, 0.24346087, 0.64186838, 0.15563309],
[ 0.37784422, 0.59678177, 0.08103329, 0.60760487, 0.65288022],
[ 0.51259188, 0.54097945, 0.30680838, 0.82303869, 0.22784574],
[ 0.21223024, 0.06426663, 0.34254093, 0.22115931, 0.30317394]])
The dirty solution that uses the Nearest values can be this:
1) Find the perimeter points of the NaN regions
2) Compute all the distances between the NaN points and the perimeter
3) Replace the NaNs with the nearest's point gray value

If you want values from the nearest neighbors, you could use the NearestNDInterpolator from scipy.interpolate. There are also other interpolators as well you can consider.
You can locate the X,Y index values for the NaN values with:
import numpy as np
nan_locs = np.where(np.isnan(data))
There are some other options for the interpolation as well. One option is to replace NaN values with the results of a median filter (but your areas are kind of large for this). Another option might be grayscale dilation. The correct interpolation depends on your end domain.
If you haven't used a SciPy ND interpolator before, you'll need to provide X, Y, and value data to fit the interpolator to then X and Y data for values to interpolate at. You can do this using the where example above as a template.

OpenCV has some image in-painting algorithms that you could use. You just need to provide a binary mask which indicates which pixels should be in-painted.
import cv2
import numpy as np
import scipy.ndimage
data = ndimage.imread("test.tif")
mask = np.isnan(data)
inpainted_img = cv2.inpaint(img, mask, inpaintRadius=3, flags=cv2.INPAINT_TELEA)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to find region bounding box of an object in image [closed] - python

If you are not using cv2, then you could look trough all pixels and find xmin, xmax, ymin ymax that equal 1. Since black is usually represented as 0 and white as 1.

Related

Convert large array of floats to colors and write to binary efficiently

Mode on a specific axis in numpy array

Count the number of pixels by color from an image loaded into a numpy array

Breaking down numpy code [closed]

filling gaps on an image using numpy and scipy

Categories

Resources