Related
Is there a way to modify images to make a "ghost" of an image and adding it back into the original (as shown below) using image processing? By ghosted I specifically mean creating a copy of the original image, and adding it back in with increased transparency shifted slightly. I tried using openCV's addWeighted function in a variety of color spaces (RGB, HSV, ...), and the images never looked correct (mainly due to colors shifting)
================================================================
Edit: 2022-10-16 21:15
When I am using OpenCV to combine images I produce the image below. The desired output is the image to the right above (produced via matplotlib)
import cv2
def shift_image(img, dx=0, dy=0):
M = np.float32([[1, 0, dx], [0, 1, dy]])
return cv2.warpAffine(img, M, (img.shape[1], img.shape[0]))
img = cv2.imread('image/0.png', cv2.IMREAD_COLOR)
mask = 255 * (~((img == 0).all(axis=2))).astype(np.uint8)
masked_img = cv2.merge([*(cv2.split(img)), mask],4)
shifted_img = shift_image(masked_img, 5, -2)
merged_img = cv2.addWeighted(masked_img, 1, shifted_img, 0.5, 0)
mask = merged_img[:,:,-1] == 0
merged_img[mask, :3] = 0
merged_img[mask, -1] = 255
cv2.imwrite("test.png", merged_img)
I have Lego cubes forming 4x4 shape, and I'm trying to infer the status of a zone inside the image:
empty/full and the color whether if yellow or Blue.
to simplify my work I have added red marker to define the border of the shape since the camera is shaking sometimes.
Here is a clear image of the shape I'm trying to detect taken by my phone camera
( EDIT : Note that this image is not my input image, it is used just to demonstrate the required shape clearly ).
The shape from the side camera that I'm supposed to use looks like this:
(EDIT : Now this is my input image)
to focus my work on the working zone I have created a mask:
what I have tried so far is to locate the red markers by color (simple threshold without HSV color-space) as following:
import numpy as np
import matplotlib.pyplot as plt
import cv2
img = cv2.imread('sample.png')
RGB = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
mask = cv2.imread('mask.png')
masked = np.minimum(RGB, mask)
masked[masked[...,1]>25] = 0
masked[masked[...,2]>25] = 0
masked = masked[..., 0]
masked = cv2.medianBlur(masked,5)
plt.imshow(masked, cmap='gray')
plt.show()
and I have spotted the markers so far:
But I'm still confused:
how to detect the external borders of the desired zone, and the internal borders (each Lego cube(Yellow-Blue-Green) borders) inside the red markers precisely?.
thanks in advance for your kind advice.
I tested this approach using your undistorted image. Suppose you have the rectified camera image, so you see the lego bricks through a "bird's eye" perspective. Now, the idea is to use the red markers to estimate a center rectangle and crop that portion of the image. Then, as you know each brick's dimensions (and they are constant) you can trace a grid and extract each cell of the grid, You can compute some HSV-based masks to estimate the dominant color on each grid, and that way you know if the space is occupied by a yellow or blue brick, of it is empty.
These are the steps:
Get an HSV mask of the red markers
Use each marker to estimate the center rectangle through each marker's coordinates
Crop the center rectangle
Divide the rectangle into cells - this is the grid
Run a series of HSV-based maks on each cell and compute the dominant color
Label each cell with the dominant color
Let's see the code:
# Importing cv2 and numpy:
import numpy as np
import cv2
# image path
path = "D://opencvImages//"
fileName = "Bg9iB.jpg"
# Reading an image in default mode:
inputImage = cv2.imread(path + fileName)
# Store a deep copy for results:
inputCopy = inputImage.copy()
# Convert the image to HSV:
hsvImage = cv2.cvtColor(inputImage, cv2.COLOR_BGR2HSV)
# The HSV mask values (Red):
lowerValues = np.array([127, 0, 95])
upperValues = np.array([179, 255, 255])
# Create the HSV mask
mask = cv2.inRange(hsvImage, lowerValues, upperValues)
The first part is very straightforward. You set the HSV range and use cv2.inRange to get a binary mask of the target color. This is the result:
We can further improve the binary mask using some morphology. Let's apply a closing with a somewhat big structuring element and 10 iterations. We want those markers as clearly defined as possible:
# Set kernel (structuring element) size:
kernelSize = 5
# Set operation iterations:
opIterations = 10
# Get the structuring element:
maxKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (kernelSize, kernelSize))
# Perform closing:
mask = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, maxKernel, None, None, opIterations, cv2.BORDER_REFLECT101)
Which yields:
Very nice. Now, let's detect contours on this mask. We will approximate each contour to a bounding box and store its starting point and dimensions. The idea being that, while we will detect every contour, we are not sure of their order. We can sort this list later and get each bounding box from left to right, top to bottom to better estimate the central rectangle. Let's detect contours:
# Create a deep copy, convert it to BGR for results:
maskCopy = mask.copy()
maskCopy = cv2.cvtColor(maskCopy, cv2.COLOR_GRAY2BGR)
# Find the big contours/blobs on the filtered image:
contours, hierarchy = cv2.findContours(mask, cv2.RETR_CCOMP, cv2.CHAIN_APPROX_SIMPLE)
# Bounding Rects are stored here:
boundRectsList = []
# Process each contour 1-1:
for i, c in enumerate(contours):
# Approximate the contour to a polygon:
contoursPoly = cv2.approxPolyDP(c, 3, True)
# Convert the polygon to a bounding rectangle:
boundRect = cv2.boundingRect(contoursPoly)
# Get the bounding rect's data:
rectX = boundRect[0]
rectY = boundRect[1]
rectWidth = boundRect[2]
rectHeight = boundRect[3]
# Estimate the bounding rect area:
rectArea = rectWidth * rectHeight
# Set a min area threshold
minArea = 100
# Filter blobs by area:
if rectArea > minArea:
#Store the rect:
boundRectsList.append(boundRect)
I also created a deep copy of the mask image for further use. Mainly to create this image, which is the result of the contour detection and bounding box approximation:
Notice that I have included a minimum area condition. I want to ignore noise below a certain threshold defined by minArea. Alright, now we have the bounding boxes in the boundRectsList variable. Let's sort this boxes using the Y coordinate:
# Sort the list based on ascending y values:
boundRectsSorted = sorted(boundRectsList, key=lambda x: x[1])
The list is now sorted and we can enumerate the boxes from left to right, top to bottom. Like this: First "row" -> 0, 1, Second "Row" -> 2, 3. Now, we can define the big, central, rectangle using this info. I call these "inner points". Notice the rectangle is defined as function of all the bounding boxes. For example, its top left starting point is defined by bounding box 0's bottom right ending point (both x and y). Its width is defined by bounding box 1's bottom left x coordinate, height is defined by bounding box 2's rightmost y coordinate. I'm gonna loop through each bounding box and extract their relevant dimensions to construct the center rectangle in the following way: (top left x, top left y, width, height). There's more than one way yo achieve this. I prefer to use a dictionary to get the relevant data. Let's see:
# Rectangle dictionary:
# Each entry is an index of the currentRect list
# 0 - X, 1 - Y, 2 - Width, 3 - Height
# Additionally: -1 is 0 (no dimension):
pointsDictionary = {0: (2, 3),
1: (-1, 3),
2: (2, -1),
3: (-1, -1)}
# Store center rectangle coordinates here:
centerRectangle = [None]*4
# Process the sorted rects:
rectCounter = 0
for i in range(len(boundRectsSorted)):
# Get sorted rect:
currentRect = boundRectsSorted[i]
# Get the bounding rect's data:
rectX = currentRect[0]
rectY = currentRect[1]
rectWidth = currentRect[2]
rectHeight = currentRect[3]
# Draw sorted rect:
cv2.rectangle(maskCopy, (int(rectX), int(rectY)), (int(rectX + rectWidth),
int(rectY + rectHeight)), (0, 255, 0), 5)
# Get the inner points:
currentInnerPoint = pointsDictionary[i]
borderPoint = [None]*2
# Check coordinates:
for p in range(2):
# Check for '0' index:
idx = currentInnerPoint[p]
if idx == -1:
borderPoint[p] = 0
else:
borderPoint[p] = currentRect[idx]
# Draw the border points:
color = (0, 0, 255)
thickness = -1
centerX = rectX + borderPoint[0]
centerY = rectY + borderPoint[1]
radius = 50
cv2.circle(maskCopy, (centerX, centerY), radius, color, thickness)
# Mark the circle
org = (centerX - 20, centerY + 20)
font = cv2.FONT_HERSHEY_SIMPLEX
cv2.putText(maskCopy, str(rectCounter), org, font,
2, (0, 0, 0), 5, cv2.LINE_8)
# Show the circle:
cv2.imshow("Sorted Rects", maskCopy)
cv2.waitKey(0)
# Store the coordinates into list
if rectCounter == 0:
centerRectangle[0] = centerX
centerRectangle[1] = centerY
else:
if rectCounter == 1:
centerRectangle[2] = centerX - centerRectangle[0]
else:
if rectCounter == 2:
centerRectangle[3] = centerY - centerRectangle[1]
# Increase rectCounter:
rectCounter += 1
This image shows each inner point with a red circle. Each circle is enumerated from left to right, top to bottom. The inner points are stored in the centerRectangle list:
If you join each inner point you get the center rectangle we have been looking for:
# Check out the big rectangle at the center:
bigRectX = centerRectangle[0]
bigRectY = centerRectangle[1]
bigRectWidth = centerRectangle[2]
bigRectHeight = centerRectangle[3]
# Draw the big rectangle:
cv2.rectangle(maskCopy, (int(bigRectX), int(bigRectY)), (int(bigRectX + bigRectWidth),
int(bigRectY + bigRectHeight)), (0, 0, 255), 5)
cv2.imshow("Big Rectangle", maskCopy)
cv2.waitKey(0)
Check it out:
Now, just crop this portion of the original image:
# Crop the center portion:
centerPortion = inputCopy[bigRectY:bigRectY + bigRectHeight, bigRectX:bigRectX + bigRectWidth]
# Store a deep copy for results:
centerPortionCopy = centerPortion.copy()
This is the central portion of the image:
Cool, now let's create the grid. You know that there must be 4 bricks per width and 4 bricks per height. We can divide the image using this info. I'm storing each sub-image, or cell, in a list. I'm also estimating each cell's center, for additional processing. These are stored in a list too. Let's see the procedure:
# Dive the image into a grid:
verticalCells = 4
horizontalCells = 4
# Cell dimensions
cellWidth = bigRectWidth / verticalCells
cellHeight = bigRectHeight / horizontalCells
# Store the cells here:
cellList = []
# Store cell centers here:
cellCenters = []
# Loop thru vertical dimension:
for j in range(verticalCells):
# Cell starting y position:
yo = j * cellHeight
# Loop thru horizontal dimension:
for i in range(horizontalCells):
# Cell starting x position:
xo = i * cellWidth
# Cell Dimensions:
cX = int(xo)
cY = int(yo)
cWidth = int(cellWidth)
cHeight = int(cellHeight)
# Crop current cell:
currentCell = centerPortion[cY:cY + cHeight, cX:cX + cWidth]
# into the cell list:
cellList.append(currentCell)
# Store cell center:
cellCenters.append((cX + 0.5 * cWidth, cY + 0.5 * cHeight))
# Draw Cell
cv2.rectangle(centerPortionCopy, (cX, cY), (cX + cWidth, cY + cHeight), (255, 255, 0), 5)
cv2.imshow("Grid", centerPortionCopy)
cv2.waitKey(0)
This is the grid:
Let's now process each cell individually. Of course, you can process each cell on the last loop, but I'm not currently looking for optimization, clarity is my priority. We need to generate a series of HSV masks with the target colors: yellow, blue and green (empty). I prefer to, again, implement a dictionary with the target colors. I'll generate a mask for each color and I'll count the number of white pixels using cv2.countNonZero. Again, I set a minimum threshold. This time of 10. With this info I can determine which mask generated the maximum number of white pixels, thus, giving me the dominant color:
# HSV dictionary - color ranges and color name:
colorDictionary = {0: ([93, 64, 21], [121, 255, 255], "blue"),
1: ([20, 64, 21], [30, 255, 255], "yellow"),
2: ([55, 64, 21], [92, 255, 255], "green")}
# Cell counter:
cellCounter = 0
for c in range(len(cellList)):
# Get current Cell:
currentCell = cellList[c]
# Convert to HSV:
hsvCell = cv2.cvtColor(currentCell, cv2.COLOR_BGR2HSV)
# Some additional info:
(h, w) = currentCell.shape[:2]
# Process masks:
maxCount = 10
cellColor = "None"
for m in range(len(colorDictionary)):
# Get current lower and upper range values:
currentLowRange = np.array(colorDictionary[m][0])
currentUppRange = np.array(colorDictionary[m][1])
# Create the HSV mask
mask = cv2.inRange(hsvCell, currentLowRange, currentUppRange)
# Get max number of target pixels
targetPixelCount = cv2.countNonZero(mask)
if targetPixelCount > maxCount:
maxCount = targetPixelCount
# Get color name from dictionary:
cellColor = colorDictionary[m][2]
# Get cell center, add an x offset:
textX = int(cellCenters[cellCounter][0]) - 100
textY = int(cellCenters[cellCounter][1])
# Draw text on cell's center:
font = cv2.FONT_HERSHEY_SIMPLEX
cv2.putText(centerPortion, cellColor, (textX, textY), font,
2, (0, 0, 255), 5, cv2.LINE_8)
# Increase cellCounter:
cellCounter += 1
cv2.imshow("centerPortion", centerPortion)
cv2.waitKey(0)
This is the result:
From here it is easy to identify the empty spaces on the grid. What I didn't cover was the perspective rectification of your distorted image, but there's plenty of info on how to do that. Hope this helps you out!
Edit:
If you want to apply this approach to your distorted image you need to undo the fish-eye and the perspective distortion. Your rectified image should look like this:
You probably will have to tweak some values because some of the distortion still remains, even after rectification.
I have an image such as this one, which is only black and white:
I would like to obtain only the flooded area of the image with the border using cv2.floodfill, like so (pardon my Paint skills):
Here's my current code:
# Copy the image.
im_floodfill = cv2.resize(actual_map_image, (500, 500)).copy()
# Floodfill from point (X, Y)
cv2.floodFill(im_floodfill, None, (X, Y), (255, 255, 255))
# Display images.
cv2.imshow("Floodfilled Image", im_floodfill)
cv2.waitKey(0)
The output I get is equal to the original image. How can I get only the flooded area with borders?
EDIT: I want to floodfill from any white point inside the "arena", like the red dot (X,Y) in the image. I wish to have only the outer border of the small circles inside the arena and the inner border of the outside walls.
EDIT2: I'm halfway there with this:
# Resize for test purposes
actual_map_image = cv2.resize(actual_map_image, (1000, 1000))
actual_map_image = cv2.cvtColor(actual_map_image, cv2.COLOR_BGR2GRAY)
h, w = actual_map_image.shape[:2]
flood_mask = np.zeros((h+2, w+2), dtype=np.uint8)
connectivity = 8
flood_fill_flags = (connectivity | cv2.FLOODFILL_FIXED_RANGE | cv2.FLOODFILL_MASK_ONLY | 255 << 8)
# Copy the image.
im_floodfill = actual_map_image.copy()
# Floodfill from point inside arena, not inside a black dot
cv2.floodFill(im_floodfill, flood_mask, (h/2 + 20, w/2 + 20), 255, None, None, flood_fill_flags)
borders = []
for i in range(len(actual_map_image)):
borders.append([B-A for A,B in zip(actual_map_image[i], flood_mask[i])])
borders = np.asarray(borders)
borders = cv2.bitwise_not(borders)
# Display images.
cv2.imshow("Original Image", cv2.resize(actual_map_image, (500, 500)))
cv2.imshow("Floodfilled Image", cv2.resize(flood_mask, (500, 500)))
cv2.imshow("Borders", cv2.resize(borders, (500, 500)))
cv2.waitKey(0)
I get this:
However, I feel like this is the wrong way of getting the borders, and they are incomplete.
I think the easiest, and fastest, way to do this is to flood-fill the arena with mid-grey. Then extract just the grey pixels and find their edges. That looks like this, but bear in mind more than half the lines are comments and debug statements :-)
#!/usr/bin/env python3
import cv2
# Load image as greyscale to use 1/3 of the memory and processing time
im = cv2.imread('arena.png', cv2.IMREAD_GRAYSCALE)
# Floodfill arena area with value 128, i.e. mid-grey
floodval = 128
cv2.floodFill(im, None, (150,370), floodval)
# DEBUG cv2.imwrite('result-1.png', im)
# Extract filled area alone
arena = ((im==floodval) * 255).astype(np.uint8)
# DEBUG cv2.imwrite('result-2.png', arena)
# Find edges and save
edges = cv2.Canny(arena,100,200)
# DEBUG cv2.imwrite('result-3.png',edges)
Here are the 3 steps of debug output showing you the sequence of processing:
result-1.png looks like this:
result-2.png looks like this:
result-3.png looks like this:
By the way, you don't have to write any Python code to do this, as you can just do it in the Terminal with ImageMagick which is included in most Linux distros and is available for macOS and Windows. The method used here corresponds exactly to the method I used in Python above:
magick arena.png -colorspace gray \
-fill gray -draw "color 370,150 floodfill" \
-fill white +opaque gray -canny 0x1+10%+30% result.png
How about dilating and xor
kernel = np.ones((3,3), np.uint8)
dilated = cv2.dilate(actual_map_image, kernel, iterations = 1)
borders = cv2.bitwise_xor(dilated, actual_map_image)
That will give you only the borders, I'm not clear if you want the circle borders only or also the interior borders, you should be able to remove borders you don't want based on size.
You can remove the exterior border with a size threshold, define a function like this:
def size_threshold(bw, minimum, maximum):
retval, labels, stats, centroids = cv.connectedComponentsWithStats(bw)
for val in np.where((stats[:, 4] < minimum) + (stats[:, 4] > maximum))[0]:
labels[labels==val] = 0
return (labels > 0).astype(np.uint8) * 255
result = size_threshold(borders, 0, 500)
Replace 500 with the a number larger than borders you want to keep and smaller than the border you want to lose.
I had to create my own Flood Fill implementation to get what I wanted. I based myself on this one.
def fill(data, start_coords, fill_value, border_value, connectivity=8):
"""
Flood fill algorithm
Parameters
----------
data : (M, N) ndarray of uint8 type
Image with flood to be filled. Modified inplace.
start_coords : tuple
Length-2 tuple of ints defining (row, col) start coordinates.
fill_value : int
Value the flooded area will take after the fill.
border_value: int
Value of the color to paint the borders of the filled area with.
connectivity: 4 or 8
Connectivity which we use for the flood fill algorithm (4-way or 8-way).
Returns
-------
filled_data: ndarray
The data with the filled area.
borders: ndarray
The borders of the filled area painted with border_value color.
"""
assert connectivity in [4,8]
filled_data = data.copy()
xsize, ysize = filled_data.shape
orig_value = filled_data[start_coords[0], start_coords[1]]
stack = set(((start_coords[0], start_coords[1]),))
if fill_value == orig_value:
raise ValueError("Filling region with same value already present is unsupported. Did you already fill this region?")
border_points = []
while stack:
x, y = stack.pop()
if filled_data[x, y] == orig_value:
filled_data[x, y] = fill_value
if x > 0:
stack.add((x - 1, y))
if x < (xsize - 1):
stack.add((x + 1, y))
if y > 0:
stack.add((x, y - 1))
if y < (ysize - 1):
stack.add((x, y + 1))
if connectivity == 8:
if x > 0 and y > 0:
stack.add((x - 1, y - 1))
if x > 0 and y < (ysize - 1):
stack.add((x - 1, y + 1))
if x < (xsize - 1) and y > 0:
stack.add((x + 1, y - 1))
if x < (xsize - 1) and y < (ysize - 1):
stack.add((x + 1, y + 1))
else:
if filled_data[x, y] != fill_value:
border_points.append([x,y])
# Fill all image with white
borders = filled_data.copy()
borders.fill(255)
# Paint borders
for x,y in border_points:
borders[x, y] = border_value
return filled_data, borders
The only thing I did was adding the else condition. If the point does not have a value equal to orig_value or fill_value, then it is a border, so I append it to a list that contains the points of all borders. Then I only paint the borders.
I was able to get the following images with this code:
# Resize for test purposes
actual_map_image = cv2.resize(actual_map_image, (500, 500))
actual_map_image = cv2.cvtColor(actual_map_image, cv2.COLOR_BGR2GRAY)
h, w = actual_map_image.shape[:2]
filled_data, borders = fill(actual_map_image, [h/2 + 20, w/2 + 20], 127, 0, connectivity=8)
cv2.imshow("Original Image", actual_map_image)
cv2.imshow("Filled Image", filled_data)
cv2.imshow("Borders", borders)
The one on the right was what I was aiming for. Thank you all!
I want to know if X color appears in an image. In this case, the color of the study will be green, therefore its RGB value is (0.255.0).
I apply the following code:
img = cv2.imread('img.jpg')
L1 = [0, 255, 0]
matches = np.all(img == L1, axis=2)
result = np.zeros_like(img)
print(result.any())
result[matches] = [255, 0, 255]
cv2.imwrite('resultado.jpg', result)
Basically:
I load the image that I want to analyze.
I describe the RGB value I want to obtain.
I check if this color (green) appears in the image.
I create an image of mine's size completely black and call it
"result".
I show by screen if that color appears through Boolean.
I DRAW THE GREEN AREA OF RED IN RESULT.
Finally I keep this last step.
Below is shown the studio image and then what is painted red.
Image to study:
Result:
Why is not a box painted the same as green but in red? Why just that little dots?
Thank you!
Problem is caused by that green area is NOT build only from [0, 255, 0] as do you think, OT21t.jpg is your input image, when I did:
import cv2
img = cv2.imread('OT21t.jpg')
print(img[950,1300])
I got [ 2 255 1], so it is not [0,255,0]. Keep in mind that when .jpg images are saved, most often it is lossy process - part of data might be jettisoned allowing smaller file size (for more about that search for lossy compression).
here is a script that does what you want, I used numpy too so it won't be difficult to adapt it for your needs.
This script will find a colour and replace it by another:
import numpy
from PIL import Image
im = numpy.array(Image.open("/path/to/img.jpg"))
tol = 4 # tolerence (0 if you want an exact match)
target_color = [0, 255, 0, 255] # color to change
replace_color = [255, 0, 255, 255] # color to use to paint the zone
for y, line in enumerate(im):
for x, px in enumerate(line):
if all((abs(px[i] - target_color[i]) < tol for i in range(3))):
im[y][x] = replace_color
Image.fromarray(im).save("./Desktop/img.png")
This one will be black with only the match coloured in the replace colour:
import numpy
from PIL import Image
im = numpy.array(Image.open("/path/to/img.jpg"))
new_im = numpy.zeros_like(im)
tol = 4 # tolerence (0 if you want an exact match)
target_color = [0, 255, 0, 255] # color to change
replace_color = [255, 0, 255, 255] # color to use to paint the zone
for y, line in enumerate(im):
for x, px in enumerate(line):
if all((abs(px[i] - target_color[i]) < tol for i in range(3))):
new_im[y][x] = replace_color
Image.fromarray(new_im).save("./Desktop/img.png")
What is missing from your script is some tolerance, because your green might not be a perfect green.
I is often more appropriate to use the "Hue, Saturation and Lightness" system rather than RGB to separate out colours in images - see Wikipedia article here.
So you might consider something like this:
#!/usr/local/bin/python3
import numpy as np
from PIL import Image
# Open image and make RGB and HSV versions
RGBim = Image.open("seaside.jpg")
HSVim = RGBim.convert('HSV')
# Make numpy versions
RGBna = np.array(RGBim)
HSVna = np.array(HSVim)
# Extract Hue
H = HSVna[:,:,0]
# Find all green pixels, i.e. where 110 < Hue < 130
lo,hi = 110,130
# Rescale to 0-255, rather than 0-360 because we are using uint8
lo = int((lo * 255) / 360)
hi = int((hi * 255) / 360)
green = np.where((H>lo) & (H<hi))
# Make all green pixels red in original image
RGBna[green] = [255,0,0]
count = green[0].size
print("Pixels matched: {}".format(count))
Image.fromarray(RGBna).save('result.png')
I want to convert a 3 channel RGB image to a index image with Python. It's used for handling the labels of training a deep net for semantic segmentation. By index image I mean it has one channel and each pixel is the index, which should starts with zero. And certainly they should have the same size. The conversion is based on the following mapping in Python dict:
color2index = {
(255, 255, 255) : 0,
(0, 0, 255) : 1,
(0, 255, 255) : 2,
(0, 255, 0) : 3,
(255, 255, 0) : 4,
(255, 0, 0) : 5
}
I've implemented a naive function:
def im2index(im):
"""
turn a 3 channel RGB image to 1 channel index image
"""
assert len(im.shape) == 3
height, width, ch = im.shape
assert ch == 3
m_lable = np.zeros((height, width, 1), dtype=np.uint8)
for w in range(width):
for h in range(height):
b, g, r = im[h, w, :]
m_lable[h, w, :] = color2index[(r, g, b)]
return m_lable
The input im is a numpy array created by cv2.imread(). However, this code is really slow.
Since the im is in numpy array I firstly tried the ufunc of numpy with something like this:
RGB2index = np.frompyfunc(lambda x: color2index(tuple(x)))
indices = RGB2index(im)
But it turns out that the ufunc takes only one element each time. I was unable to give the function three arguments(RGB value) one time.
So is there any other ways to do the optimization?
The mapping has not to be that way, if a more efficient data structure exists. I noticed that the access of a Python dict dose not cost much time, but the casting from numpy array to tuple(which is hashable) does.
PS:
One idea I got is to implement a kernel in CUDA. But it would be more complicated.
UPDATA1:
Dan Mašek's Answer works fine. But first we have to convert the RGB image to grayscale. It could be problematic when two colors have the same grayscale value.
I paste the working code here. Hope it could help others.
lut = np.ones(256, dtype=np.uint8) * 255
lut[[255,29,179,150,226,76]] = np.arange(6, dtype=np.uint8)
im_out = cv2.LUT(cv2.cvtColor(im, cv2.COLOR_BGR2GRAY), lut)
What about this?
color2index = {
(255, 255, 255) : 0,
(0, 0, 255) : 1,
(0, 255, 255) : 2,
(0, 255, 0) : 3,
(255, 255, 0) : 4,
(255, 0, 0) : 5
}
def rgb2mask(img):
assert len(img.shape) == 3
height, width, ch = img.shape
assert ch == 3
W = np.power(256, [[0],[1],[2]])
img_id = img.dot(W).squeeze(-1)
values = np.unique(img_id)
mask = np.zeros(img_id.shape)
for i, c in enumerate(values):
try:
mask[img_id==c] = color2index[tuple(img[img_id==c][0])]
except:
pass
return mask
Then just call:
mask = rgb2mask(ing)
Here's a small utility function to convert images (np.array) to per-pixel labels (indices), which can also be a one-hot encoding:
def rgb2label(img, color_codes = None, one_hot_encode=False):
if color_codes is None:
color_codes = {val:i for i,val in enumerate(set( tuple(v) for m2d in img for v in m2d ))}
n_labels = len(color_codes)
result = np.ndarray(shape=img.shape[:2], dtype=int)
result[:,:] = -1
for rgb, idx in color_codes.items():
result[(img==rgb).all(2)] = idx
if one_hot_encode:
one_hot_labels = np.zeros((img.shape[0],img.shape[1],n_labels))
# one-hot encoding
for c in range(n_labels):
one_hot_labels[: , : , c ] = (result == c ).astype(int)
result = one_hot_labels
return result, color_codes
img = cv2.imread("input_rgb_for_labels.png")
img_labels, color_codes = rgb2label(img)
print(color_codes) # e.g. to see what the codebook is
img1 = cv2.imread("another_rgb_for_labels.png")
img1_labels, _ = rgb2label(img1, color_codes) # use the same codebook
It calculates (and returns) the color codebook if None is supplied.
actually for-loop takes much time.
binary_mask = (im_array[:,:,0] == 255) & (im_array[:,:,1] == 255) & (im_array[:,:,2] == 0)
maybe above code can help you
I've implemented a naive function: …
I firstly tried the ufunc of numpy with something like this: …
I suggest using an even more naive function which converts just one pixel:
def rgb2index(rgb):
"""
turn a 3 channel RGB color to 1 channel index color
"""
return color2index[tuple(rgb)]
Then using a numpy routine is a good idea, but we don't need a ufunc:
np.apply_along_axis(rgb2index, 2, im)
Here numpy.apply_along_axis() is used to apply our rgb2index() function to the RGB slices along the last of the three axes (0, 1, 2) for the whole image im.
We could even do without the function and just write:
np.apply_along_axis(lambda rgb: color2index[tuple(rgb)], 2, im)
Similar to what Armali and Mendrika proposed, I somehow had to tweak it a little bit to get it to work (maybe totally my fault). So I just wanted to share a snippet that works.
COLORS = np.array([
[0, 0, 0],
[0, 0, 255],
[255, 0, 0]
])
W = np.power(255, [0, 1, 2])
HASHES = np.sum(W * COLORS, axis=-1)
HASH2COLOR = {h : c for h, c in zip(HASHES, COLORS)}
HASH2IDX = {h: i for i, h in enumerate(HASHES)}
def rgb2index(segmentation_rgb):
"""
turn a 3 channel RGB color to 1 channel index color
"""
s_shape = segmentation_rgb.shape
s_hashes = np.sum(W * segmentation_rgb, axis=-1)
func = lambda x: HASH2IDX[int(x)]
segmentation_idx = np.apply_along_axis(func, 0, s_hashes.reshape((1, -1)))
segmentation_idx = segmentation_idx.reshape(s_shape[:2])
return segmentation_idx
segmentation = np.array([[0, 0, 0], [0, 0, 255], [255, 0, 0]] * 3).reshape((3, 3, 3))
rgb2index(segmentation)
Example plot
The code is also available here:
https://github.com/theRealSuperMario/supermariopy/blob/dev/scripts/rgb2labels.py
Did you check Pillow library https://python-pillow.org/? As I remember, it has some classes and methods to deal with color conversion. See: https://pillow.readthedocs.io/en/4.0.x/reference/Image.html#PIL.Image.Image.convert
If you are happy using MATLAB - maybe saving the result as *.mat and loading with scipy.io.loadmat - there is the rgb2ind function in MATLAB, which does exactly what you are asking for. If not, it could be used as inspiration for a similar implementation in Python.