I have an image of a sticky note on a background (say a wall, or a laptop) and I want to detect the edges of the sticky note (rough detection also works fine) so that i can run a crop on it.
I plan on using ImageMagick for the actual cropping, but am stuck on detecting the edges.
Ideally, my output should give me 4 coordinates for the 4 border points so I can run my crop on it.
How should I proceed with this?
You can do that with ImageMagick.
There are different IM methods one can come up with. Here is the first algorithm which came to mind for me. It assumes the "sticky notes" are not tilted or rotated on the larger image:
First stage: use canny edge detection to reveal the edges of the sticky note.
Second stage: determine the coordinates of the edges.
Canny Edge Detection
This command will create a black+white image depicting all edges in the original image:
convert \
http://i.stack.imgur.com/SxrwG.png \
-canny 0x1+10%+30% \
canny-edges.png
Determine Coordinates of Edges
Assuming the image is sized XxY pixels. Then you can resize an image into a 1xY column and a Xx1 row of pixels, where each pixel's color value is the average of the respective pixels of all pixels which were in the same row or same column as the respective column/row pixel.
As an example which can be seen below, I'll first resize the new canny-edges.png to 4xY and Xx4 images:
identify -format " %W x %H\n" canny-edges.png
400x300
convert canny-edges.png -resize 400x4\! canny-4cols.png
convert canny-edges.png -resize 4x300\! canny-4rows.png
canny-4cols.png
canny-4rows.png
Now that the previous images visualized what the compression-resizing of an image into a few columns or rows of pixels will achieve, let's do it with a single column and a single row. At the same time we'll change the output format to text, not PNG, in order to get the coordinates of these pixels which are white:
convert canny-edges.png -resize 400x1\! canny-1col.txt
convert canny-edges.png -resize 1x300\! canny-1row.txt
Here is part of the output from canny-1col.txt:
# ImageMagick pixel enumeration: 400,1,255,gray
0,0: (0,0,0) #000000 gray(0)
1,0: (0,0,0) #000000 gray(0)
2,0: (0,0,0) #000000 gray(0)
[....]
73,0: (0,0,0) #000000 gray(0)
74,0: (0,0,0) #000000 gray(0)
75,0: (10,10,10) #0A0A0A gray(10)
76,0: (159,159,159) #9F9F9F gray(159)
77,0: (21,21,21) #151515 gray(21)
78,0: (156,156,156) #9C9C9C gray(156)
79,0: (14,14,14) #0E0E0E gray(14)
80,0: (3,3,3) #030303 gray(3)
81,0: (3,3,3) #030303 gray(3)
[....]
162,0: (3,3,3) #030303 gray(3)
163,0: (4,4,4) #040404 gray(4)
164,0: (10,10,10) #0A0A0A gray(10)
165,0: (7,7,7) #070707 gray(7)
166,0: (8,8,8) #080808 gray(8)
167,0: (8,8,8) #080808 gray(8)
168,0: (8,8,8) #080808 gray(8)
169,0: (9,9,9) #090909 gray(9)
170,0: (7,7,7) #070707 gray(7)
171,0: (10,10,10) #0A0A0A gray(10)
172,0: (5,5,5) #050505 gray(5)
173,0: (13,13,13) #0D0D0D gray(13)
174,0: (6,6,6) #060606 gray(6)
175,0: (10,10,10) #0A0A0A gray(10)
176,0: (10,10,10) #0A0A0A gray(10)
177,0: (7,7,7) #070707 gray(7)
178,0: (8,8,8) #080808 gray(8)
[....]
319,0: (3,3,3) #030303 gray(3)
320,0: (3,3,3) #030303 gray(3)
321,0: (14,14,14) #0E0E0E gray(14)
322,0: (156,156,156) #9C9C9C gray(156)
323,0: (21,21,21) #151515 gray(21)
324,0: (159,159,159) #9F9F9F gray(159)
325,0: (10,10,10) #0A0A0A gray(10)
326,0: (0,0,0) #000000 gray(0)
327,0: (0,0,0) #000000 gray(0)
[....]
397,0: (0,0,0) #000000 gray(0)
398,0: (0,0,0) #000000 gray(0)
399,0: (0,0,0) #000000 gray(0)
As you can see, the detected edges from the text also influenced the grayscale values of the pixels. So we could introduce an additional -threshold 50% operation into our commands, to get pure black+white output:
convert canny-edges.png -resize 400x1\! -threshold 50% canny-1col.txt
convert canny-edges.png -resize 1x300\! -threshold 50% canny-1row.txt
I'll not quote the contents of the new text files here, you can try it and look for yourself if you are interested. Instead, I'll do a shortcut: I'll output the textual representation of the pixel color values to <stdout> and directly grep it for all non-black pixels:
convert canny-edges.png -resize 400x1\! -threshold 50% txt:- \
| grep -v black
# ImageMagick pixel enumeration: 400,1,255,srgb
76,0: (255,255,255) #FFFFFF white
78,0: (255,255,255) #FFFFFF white
322,0: (255,255,255) #FFFFFF white
324,0: (255,255,255) #FFFFFF white
convert canny-edges.png -resize 1x300\! -threshold 50% txt:- \
| grep -v black
# ImageMagick pixel enumeration: 1,300,255,srgb
0,39: (255,255,255) #FFFFFF white
0,41: (255,255,255) #FFFFFF white
0,229: (255,255,255) #FFFFFF white
0,231: (255,255,255) #FFFFFF white
From above results you can conclude that the four pixel coordinates of the
stick note inside the other image are:
lower left corner: (323|40)
upper right corner: (77|230)
The width of the area is 246 pixels and the height is 190 pixels.
(ImageMagick assumes the origin of its coordinate system the upper left corner of an image.)
To now cut the sticky note from the original image you can do:
convert http://i.stack.imgur.com/SxrwG.png[246x190+77+40] sticky-note.png
More options to explore
autotrace
You can streamline the above procedure (even transform it into an automatically working script if you want) even more, by converting the intermediate "canny-edges.png" into an SVG vector graphic, for example by running it through autotrace...
This could be useful if your sticky note is tilted or rotated.
Hough Line Detection
Once you have the "canny" lines, you could also apply the Hough Line Detection algorithm on them:
convert \
canny-edges.png \
-background black \
-stroke red \
-hough-lines 5x5+20 \
lines.png
Note that the -hough-lines operator extends and draws detected lines from one edge (with floating point values) to another edge of the original image.
While the previous command finally converted the lines to a PNG the -hough-lines operator really generates an MVG file (Magick Vector Graphics) internally. That means you could actually read the source code of the MVG file, and determine the mathematical parameters of each line which is depicted in the "red lines" image:
convert \
canny-edges.png \
-hough-lines 5x5+20 \
lines.mvg
This is more sophisticated and also works for edges which are not strictly horizontal and/or vertical.
But your example image does use horizontal and vertical edges, so you can even use simple shell commands to discover these.
There are 80 line descriptions in total in the generated MVG file. You can identify all horizontal lines in that file:
cat lines.mvg \
| while read a b c d e ; do \
if [ x${b/0,/} == x${c/400,/} ]; then \
echo "$a $b $c $d $e" ; \
fi; \
done
line 0,39.5 400,39.5 # 249
line 0,62.5 400,62.5 # 48
line 0,71.5 400,71.5 # 52
line 0,231.5 400,231.5 # 249
Now identify all vertical lines:
cat lines.mvg \
| while read a b c d e; do \
if [ x${b/,0/} == x${c/,300} ]; then \
echo "$a $b $c $d $e" ; \
fi; \
done
line 76.5,0 76.5,300 # 193
line 324.5,0 324.5,300 # 193
I have met similar problem of detecting image borders (whitespaces) last week and spent many hours trying various approaches and tools and after that finally solved it using entropy difference calculation approach, so JFYI here is algorithm.
Let's assume you want to detect if your 200x100px image has border at the top:
Get upper piece of image 25% height (25px) (0: 25, 0: 200)
Get lower piece with the same height starting from upper piece end and deeper to image center (25: 50, 0: 200)
upper and lower pieces depicted
Calculate entropies for the both pieces
Find entropy difference and store it with current block height
Make upper piece 1px less (24 px) and repeat from p.2 until we hit image edge (height 0) - resizing scan area every iteration thus sliding up to the image edge
Find maximum of the stored entropy differences and its block height - this is center of our border if it lies closer to the edge rather than to the center of image and maximum entropy difference is higher than pre-set threshold (0.5 for example)
And apply this algorithm to every side of your image.
Here is a piece of code to detect if image has top border and find its approximate coordinate (offset from top), pass grayscale ('L' mode) Pillow image to the scan function:
import numpy as np
MEDIAN = 0.5
def scan(im):
w, h = im.size
array = np.array(im)
center_ = None
diff_ = None
for center in reversed(range(1, h // 4 + 1)):
upper = entropy(array[0: center, 0: w].flatten())
lower = entropy(array[center: 2 * center, 0: w].flatten())
diff = upper / lower if lower != 0.0 else MEDIAN
if center_ is None or diff_ is None:
center_ = center
diff_ = diff
if diff < diff_:
center_ = center
diff_ = diff
top = diff_ < MEDIAN and center_ < h // 4, center_, diff_
Full source with examples of bordered and clear (not bordered) images processed is here: https://github.com/embali/enimda/
Related
I'm trying to find a way to transform an image by translating one of its vertexes.
I have already found various methods for transforming an image like rotation and scaling, but none of the methods involved skewing like so:
There is shearing, but it's not the same since it can move two or more of the image's vertex while I only want to move one.
What can I use that can perform such an operation?
I took your "cat-thing" and resized it to a nice size, added some perfectly vertical and horizontal white gridlines and added some extra canvas in red at the bottom to give myself room to transform it. That gave me this which is 400 pixels wide and 450 pixels tall:
I then used ImageMagick to do a "Bilinear Forward Transform" in Terminal. Basically you give it 4 pairs of points, the first pair is where the top-left corner is before the transform and then where it must move to. The next pair is where the top-right corner is originally followed by where it ends up. Then the bottom-right. Then the bottom-left. As you can see, 3 of the 4 pairs are unmoved - only the bottom-right corner moves. I also made the virtual pixel black so you can see where pixels were invented by the transform in black:
convert cat.png -matte -virtual-pixel black -interpolate Spline -distort BilinearForward '0,0 0,0 399,0 399,0 399,349 330,430 0,349 0,349' bilinear.png
I also did a "Perspective Transform" using the same transform coordinates:
convert cat.png -matte -virtual-pixel black -distort Perspective '0,0 0,0 399,0 399,0 399,349 330,430 0,349 0,349' perspective.png
Finally, to illustrate the difference, I made a flickering comparison between the 2 images so you can see the difference:
I am indebted to Anthony Thyssen for his excellent work here which I commend to you.
I understand you were looking for a Python solution and would point out that there is a Python binding to ImageMagick called Wand which you may like to use - here.
Note that I only used red and black to illustrate what is going on (atop the Stack Overflow white background) and where aspects of the result come from, you would obviously use white for both!
The perspective transformation is likely what you want, since it preserves straight lines at any angle. (The inverse bilinear only preserves horizontal and vertical straight lines).
Here is how to do it in ImageMagick, Python Wand (based upon ImageMagick) and Python OpenCV.
Input:
ImageMagick
(Note the +distort makes the output the needed size to hold the full result and is not restricted to the size of the input. Also the -virtual-pixel white sets color of the area outside the image pixels to white. The points are ordered clockwise from the top left in pairs as inx,iny outx,outy)
convert cat.png -virtual-pixel white +distort perspective \
"0,0 0,0 359,0 359,0 379,333 306,376 0,333 0,333" \
cat_perspective_im.png
Python Wand
(Note the best_fit=true makes the output the needed size to hold the full result and is not restricted to the size of the input.)
#!/bin/python3.7
from wand.image import Image
from wand.display import display
with Image(filename='cat.png') as img:
img.virtual_pixel = 'white'
img.distort('perspective', (0,0, 0,0, 359,0, 359,0, 379,333, 306,376, 0,333, 0,333), best_fit=True)
img.save(filename='cat_perspective_wand.png')
display(img)
Python OpenCV
#!/bin/python3.7
import cv2
import numpy as np
# Read source image.
img_src = cv2.imread('cat.png')
# Four corners of source image
# Coordinates are in x,y system with x horizontal to the right and y vertical downward
pts_src = np.float32([[0,0], [359,0], [379,333], [0,333]])
# Four corners of destination image.
pts_dst = np.float32([[0, 0], [359,0], [306,376], [0,333]])
# Get perspecive matrix if only 4 points
m = cv2.getPerspectiveTransform(pts_src,pts_dst)
# Warp source image to destination based on matrix
# size argument is width x height
# compute from max output coordinates
img_out = cv2.warpPerspective(img_src, m, (359+1,376+1), cv2.INTER_LINEAR, borderMode=cv2.BORDER_CONSTANT, borderValue=(255, 255, 255))
# Save output
cv2.imwrite('cat_perspective_opencv.png', img_out)
# Display result
cv2.imshow("Warped Source Image", img_out)
cv2.waitKey(0)
cv2.destroyAllWindows()
I have an image created by many polygons of different solid colors. The coordinates themselves are not given, but can be detected if necessary.
I'm looking for a way to detect all points which are the intersection of 3 or more different colors. The colors are not known in advance, might be similar to each other (e.g one might be (255, 255, 250) and another is (255, 255, 245). The specific shade doesn't matter, just the fact that it is different).
for example, in the following image a tiny star marks all the points that I'm looking for.
As your annotations have obscured the intersections you are trying to identify, I made a new, similar image.
Rather than trying to bend my brain around trying to deal with 3-dimensions of 8-bit RGB colour, I converted that to a single 24-bit integer and then ran a generic filter from SciPy and counted the number of unique colours in each 3x3 window and made a new image from that. So each pixel in the result has a brightness value equal to the number of colours in its neighbourhood. I counted the number of colours by converting the Numpy array of neighbours into a Python set - exploiting the fact that a set can only have unique numbers in it.
#!/usr/bin/env python3
import numpy as np
from PIL import Image
from scipy.ndimage import generic_filter
# CountUnique
def CountUnique(P):
"""
We receive P[0]..P[8] with the pixels in the 3x3 surrounding window, return count of unique values
"""
return len(set(P))
# Open image and make into Numpy array
PILim = Image.open('patches.png').convert('RGB')
RGBim = np.array(PILim)
# Make a single channel 24-bit image rather than 3 channels of 8-bit each
RGB24 = (RGBim[...,0].astype(np.uint32)<<16) | (RGBim[...,1].astype(np.uint32)<<8) | RGBim[...,2].astype(np.uint32)
# Run generic filter counting unique colours in neighbourhood
result = generic_filter(RGB24, CountUnique, (3, 3))
# Save result
Image.fromarray(result.astype(np.uint8)).save('result.png')
The resultant image is shown here, with the contrast stretched so that you can see the brightest pixels at the intersections you seek.
A histogram of the values in the result image shows there are 21 pixels which have 3 unique colours in their 3x3 neighbourhood and 4,348 pixels which have 2 unique colours in their neighbourhood. You can find these by running np.where(result==3), for example.
Histogram:
155631: ( 1, 1, 1) #010101 gray(1)
4348: ( 2, 2, 2) #020202 gray(2)
21: ( 3, 3, 3) #030303 gray(3)
For extra fun, I had a go at programming the method suggested by #Micka and that gives the same results, code looks like this:
#!/usr/bin/env python3
import numpy as np
from PIL import Image
from skimage.morphology import dilation, disk
# Open image and make into Numpy array
PILim = Image.open('patches.png').convert('RGB')
RGBim = np.array(PILim)
h, w = RGBim.shape[0], RGBim.shape[1]
# Make a single channel 24-bit image rather than 3 channels of 8-bit each
RGB24 = (RGBim[...,0].astype(np.uint32)<<16) | (RGBim[...,1].astype(np.uint32)<<8) | RGBim[...,2].astype(np.uint32)
# Make list of unique colours
UniqueColours = np.unique(RGB24)
# Create result image
result = np.zeros((h,w),dtype=np.uint8)
# Make mask for any particular colour - same size as original image
mask = np.zeros((h,w), dtype=np.uint8)
# Make disk-shaped structuring element for morphology
selem = disk(1)
# Iterate over unique colours
for i,u in enumerate(UniqueColours):
# Turn on all pixels matching this unique colour, turn off all others
mask = np.where(RGB24==u,1,0)
# Dilate (fatten) the mask by 1 pixel
mask = dilation(mask,selem)
# Add all activated pixels to result image
result = result + mask
# Save result
Image.fromarray(result.astype(np.uint8)).save('result.png')
For reference, I created the image with anti-aliasing disabled in ImageMagick at the command line like this:
convert -size 400x400 xc:red -background red +antialias \
-fill blue -draw "polygon 42,168 350,72 416,133 416,247 281,336" \
-fill yellow -draw "polygon 271,11 396,127 346,154 77,86" \
-fill lime -draw "polygon 366,260 366,400 120,400" patches.png
Keywords: Python, image, image processing, intersect, intersection, PIL/Pillow, adjacency, neighbourhood, neighborhood, neighbour, neighbor, generic, SciPy, 3x3, filter.
I'd like to be able to automagically convert full color images down to three color (black / red / white) for an e-ink display (Waveshare 7.5"). Right now I'm just letting the screen handle it, but as expected complex images get washed out.
Are there any algorithms or filters I could apply to make things a bit more visible?
Right now I'm using Python, but I'm not averse to other languages/environments if necessary.
Good image:
Washed out image:
You could make your own palette of 3 acceptable colours like this:
magick xc:red xc:white xc:black +append palette.gif
Then you can apply it to your image like this:
magick input.png +dither -remap palette.gif result.png
If you want to send it straight to the frame buffer and it supports RB888, you can try running something like this:
magick input.png +dither -remap palette.gif -depth 8 RGB:/dev/fb0
Just adding a bit to Mark Setchell's answer. For printing you might be better dithering your 3 colors. So here is your image with and without dithering using Imagemagick 7. If using Imagemagick 6, replace magick with convert.
Input:
Create 3 color palette:
magick xc:red xc:white xc:black +append palette.gif
With dithering(default is Floyd-Steinberg):
magick input.png -remap palette.gif result.png
[![enter image description here][2]][2]
With out dithering
magick input.png -dither none -remap palette.gif result2.png
[![enter image description here][3]][3]
If you want Python, then you could try Python Wand. It is based upon Imagemagick.
ADDITION:
To separate the red and black into two image, each of which are represented by black and the rest as white, you can do the following and save as BMP as you want in your comments. (You can do this with or without dithering from above as you desire)
magick result.png -color-threshold "red-red" -negate red.bmp
magick result.png -color-threshold "black-black" -negate black.bmp
Red:
Black:
You appear to be choosing the nearest color for each pixel. See if a dithering algorithm works better for your purposes. Generally, dithering algorithms take into account neighboring pixels when determining how to color a given pixel.
EDIT: In the case of PIL (the Python Imaging Library), it doesn't seem trivial to dither to an arbitrary set of three colors, at least as of 2012.
Just adding a bit to Mark and Fred's answers. I'm using ImageMagick on Raspberry Pi, which is version < 7 and uses "convert". Some of the commands Fred had suggested didn't work for that version. Here's what I did to resize, remap and dither, and split the image into white-and-black and white-and-red sub-images.
# Create palette with red, white and black colors
convert xc:red xc:white xc:black +append palette.gif
# Resize input file into size suitable for ePaper Display - 264x176
# Converting to BMP.
# Note, if working with JPG, it is a lossy
# format and subsequently remapping and working with it results
# in the color palette getting overwritten - we just convert to BMP
# and work with that instead
convert $1 -resize 264x176^ -gravity center -extent 264x176 resized.bmp
# Remap the resized image into the colors of the palette using
# Floyd Steinberg dithering (default)
# Resulting image will have only 3 colors - red, white and black
convert resized.bmp -remap palette.gif result.bmp
# Replace all the red pixels with white - this
# isolates the white and black pixels - i.e the "black"
# part of image to be rendered on the ePaper Display
convert -fill white -opaque red result.bmp result_black.bmp
# Similarly, Replace all the black pixels with white - this
# isolates the white and red pixels - i.e the "red"
# part of image to be rendered on the ePaper Display
convert -fill white -opaque black result.bmp result_red.bmp
I've also implemented in using Python Wand, a Python layer over ImageMagick
# This function takes as input a filename for an image
# It resizes the image into the dimensions supported by the ePaper Display
# It then remaps the image into a tri-color scheme using a palette (affinity)
# for remapping, and the Floyd Steinberg algorithm for dithering
# It then splits the image into two component parts:
# a white and black image (with the red pixels removed)
# a white and red image (with the black pixels removed)
# It then converts these into PIL Images and returns them
# The PIL Images can be used by the ePaper library to display
def getImagesToDisplay(filename):
print(filename)
red_image = None
black_image = None
try:
with WandImage(filename=filename) as img:
img.resize(264, 176)
with WandImage() as palette:
with WandImage(width = 1, height = 1, pseudo ="xc:red") as red:
palette.sequence.append(red)
with WandImage(width = 1, height = 1, pseudo ="xc:black") as black:
palette.sequence.append(black)
with WandImage(width = 1, height = 1, pseudo ="xc:white") as white:
palette.sequence.append(white)
palette.concat()
img.remap(affinity=palette, method='floyd_steinberg')
red = img.clone()
black = img.clone()
red.opaque_paint(target='black', fill='white')
# This is not nececessary - making the white and red image
# white and black instead - left here FYI
# red.opaque_paint(target='red', fill='black')
black.opaque_paint(target='red', fill='white')
red_image = Image.open(io.BytesIO(red.make_blob("bmp")))
black_image = Image.open(io.BytesIO(black.make_blob("bmp")))
except Exception as ex:
print ('traceback.format_exc():\n%s',traceback.format_exc())
return (red_image, black_image)
Here's my writeup on my project on Hackster (including full source code links) - https://www.hackster.io/sridhar-rajagopal/photostax-digital-epaper-photo-frame-84d4ed
I've attributed both Mark and Fred there - thank you!
I am analyzing medical images. All images have a marker with the position. It looks like this
It is the "TRH RMLO" annotation in this image, but it can be different in other images. Also the size varies. The image is cropped but you see that the tissue is starting on the right side.
I found that the presence of these markers distort my analysis.
How can I remove them?
I load the image in python like this
import dicom
import numpy as np
img = dicom.read_file(my_image.dcm)
img_array = img.pixel_array
The image is then a numpy array. The white text is always surrounded by a large black area (black has value zero). The marker is in a different position in each image.
How can I remove the white text without hurting the tissue data.
UPDATE
added a second image
UPDATE2:
Here are two of the original dicom files. All personal information has been removed.edit:removed
Looking at the actual pixel values of the image you supplied, you can see that the marker is almost (99.99%) pure white and this doesn't occur elsewhere in the image so you can isolate it with a simple 99.99% threshold.
I prefer ImageMagick at the command-line, so I would do this:
convert sample.dcm -threshold 99.99% -negate mask.png
convert sample.dcm mask.png -compose darken -composite result.jpg
Of course, if the sample image is not representative, you may have to work harder. Let's look at that...
If the simple threshold doesn't work for your images, I would look at "Hit and Miss Morphology". Basically, you threshold your image to pure black and white - at around 90% say, and then you look for specific shapes, such as the corner markers on the label. So, if we want to look for the top-left corner of a white rectangle on a black background, and we use 0 to mean "this pixel must be black", 1 to mean "this pixel must be white" and - to mean "we don't care", we would use this pattern:
0 0 0 0 0
0 1 1 1 1
0 1 - - -
0 1 - - -
0 1 - - -
Hopefully you can see the top left corner of a white rectangle there. That would be like this in the Terminal:
convert sample.dcm -threshold 90% \
-morphology HMT '5x5:0,0,0,0,0 0,1,1,1,1 0,1,-,-,- 0,1,-,-,- 0,1,-,-,-' result.png
Now we also want to look for top-right, bottom-left and bottom-right corners, so we need to rotate the pattern, which ImageMagick handily does when you add the > flag:
convert sample.dcm -threshold 90% \
-morphology HMT '5x5>:0,0,0,0,0 0,1,1,1,1 0,1,-,-,- 0,1,-,-,- 0,1,-,-,-' result.png
Hopefully you can see dots demarcating the corners of the logo now, so we could ask ImageMagick to trim the image of all extraneous black and just leave the white dots and then tell us the bounding box:
cconvert sample.dcm -threshold 90% \
-morphology HMT '5x5>:0,0,0,0,0 0,1,1,1,1 0,1,-,-,- 0,1,-,-,- 0,1,-,-,-' -format %# info:
308x198+1822+427
So, if I now draw a red box around those coordinates, you can see where the label has been detected - of course in practice I would draw a black box to cover it but I am explaining the idea:
convert sample.dcm -fill "rgba(255,0,0,0.5)" -draw "rectangle 1822,427 2130,625" result.png
If you want a script to do that automagically, I would use something like this, saving it as HideMarker:
#!/bin/bash
input="$1"
output="$2"
# Find corners of overlaid marker using Hit and Miss Morphology, then get crop box
IFS="x+" read w h x1 y1 < <(convert "$input" -threshold 90% -morphology HMT '5x5>:0,0,0,0,0 0,1,1,1,1 0,1,-,-,- 0,1,-,-,- 0,1,-,-,-' -format %# info:)
# Calculate bottom-right corner from top-left and dimensions
((x1=x1-1))
((y1=y1-1))
((x2=x1+w+1))
((y2=y1+h+1))
convert "$input" -fill black -draw "rectangle $x1,$y1 $x2,$y2" "$output"
Then you would do this to make it executable:
chmod +x HideMarker
And run it like this:
./HideMarker someImage.dcm result.png
I have another idea. This solution is in OpenCV using python. It is a rather solution.
First, obtain the binary threshold of the image.
ret,th = cv2.threshold(img,2,255, 0)
Perform morphological dilation:
dilate = cv2.morphologyEx(th, cv2.MORPH_DILATE, kernel, 3)
To join the gaps, I then used median filtering:
median = cv2.medianBlur(dilate, 9)
Now you can use the contour properties to eliminate the smallest contour and retain the other containing the image.
It also works for the second image:
If these annotations are in the DICOM file there are a couple ways they could be stored (see https://stackoverflow.com/a/4857782/1901261). The currently supported method can be cleaned off by simply removing the 60xx group attributes from the files.
For the deprecated method (which is still commonly used) you can clear out the unused high bit annotations manually without messing up the other image data as well. Something like:
int position = object.getInt( Tag.OverlayBitPosition, 0 );
if( position == 0 ) return;
int bit = 1 << position;
int[] pixels = object.getInts( Tag.PixelData );
int count = 0;
for( int pix : pixels )
{
int overlay = pix & bit;
pixels[ count++ ] = pix - overlay;
}
object.putInts( Tag.PixelData, VR.OW, pixels );
If these are truly burned into the image data, you're probably stuck using one of the other recommendations here.
The good thing is, that these watermarks are probably in an isolated totally black are which makes it easier (although it's questionable if removing this is according to the indicated usage; license-stuff).
Without beeing an expert, here is one idea. It might be a sketch of some very very powerful approach tailored to this problem but you have to decide if implementation-complexity & algorithmic-complexity (very dependent on image-statistics) are worth it:
Basic idea
Detect the semi-cross like borders (4)
Calculate the defined rectangle from these
Black-out this rectangle
Steps
0
Binarize
1
Use some gradient-based edge-detector to get all the horizontal edges
There may be multiple; you can try to give min-length (maybe some morphology needed to connect pixels which are not connected based on noise in source or algorithm)
2
Use some gradient-based edge-detector to get all the horizontal edges
Like the above, but a different orientation
3
Do some connected-component calculation to get some objects which are vertical and horizontal lines
Now you can try different chosings of candidate-components (8 real ones) with the following knowledge
two of these components can be described by the same line (slope-intercept form; linear regression problem) -> line which borders the rectangle
it's probably that the best 4 pair-chosings (according to linear-regression loss) are the valid borders of this rectangle
you might add the assumption, that vertical borders and horizontal borders are orthogonal to each other
4
- Calculate the rectangle from these borders
- Widen it by a few pixels (hyper-parameter)
- Black-out that rectangle
That's the basic approach.
Alternative
This one is much less work, use more specialized tools and assumes the facts in the opening:
the stuff to remove is on some completely black part of the image
it's kind of isolated; distance to medical-data is high
Steps
Run some general OCR to detect characters
Get the occupied pixels / borders somehow (i'm not sure what OCR tools return)
Calculate some outer rectangle and black-out (using some predefined widening-gap; this one needs to be much bigger than the one above)
Alternative 2
Sketch only: The idea is to use something like binary-closing on the image somehow to build fully connected-components ouf of the source pixels (while small gaps/holes are filled), so that we got one big component describing the medical-data and one for the watermark. Then just remove the smaller one.
I am sure this can be optimized, but ... You could create 4 patches of size 3x3 or 4x4, and initialize them with the exact content of the pixel values for each of the individual corners of the frame surrounding the annotation text. You could then iterate over the whole image (or have some smart initialization looking only in the black area) and find the exact match for those patches. It is not very likely you will have the same regular structure (90 deg corner surrounded by near 0) in the tissue, so this might give you the bounding box.
Simpler one is still possible!!!.
Just implement following after (img_array = img.pixel_array)
img_array[img_array > X] = Y
In which X is the intensity threshold you want to eliminate after that. Also Y is the intensity value which you want to consider instead of that.
For example:
img_array[img_array > 4000] = 0
Replace white matter greater than 4000 with black intensity 0.
I have a scanned image which is basically black print on some weird (non-gray) background, say, green or yellow (think old paper).
How can I get rid of the green/yellow and receive a gray picture with as much of the gray structure of the original image intact? I.e. I want to keep the gray around the letters for the anti-aliasing effect or for gray areas but I want to turn anything which even is remotely green/yellow to become pure white?
Note that the background is by no means homogeneous; so the algorithm should be able accept a color and an error margin or a color range.
For bonus points: How can I automatically determine the background color?
I'd like to use Python with the Imaging Library or maybe ImageMagick.
Note: I'm aware of packages like unpaper. My problem with unpaper is that it produces B&W images which probably look good for an OCR software but not for the human eye.
I am more of C++ than python programmer, so I can't give you a code sample. But the general algorithm is something like this:
Finding the background color:
You make a histogram of the image. The histogram should have two peaks representing the background and foreground colors. Because you know that the background has higher intensity you choose the peak with higher intensity and that is the background color.
Now you have the RGB background (R_bg, G_bg, B_bg)
Setting the background to white:
You loop over all pixels and calculates the distance to the background:
distance = sqrt((R_bg - R_pixel) ^ 2 + (G_bg - G_pixel) ^ 2 + (B_bg - B_pixel) ^ 2)
If the distance is less than a threshold you set the pixel to white. You can experiment with different thresholds until you get a good result.
I was looking to make an arbitrary background color transparent a while ago and developed this script. It takes the most popular (background) color in an image and creates an alpha mask where the transparency is proportional to the distance from the background color. Taking RGB colorspace distances is an expensive process for large images so I've tried some optimization using numpy and a fast integer sqrt approximation operation. Converting to HSV first might be the right approach. If you havn't solved your problem yet, I hope this helps:
from PIL import Image
import sys, time, numpy
fldr = r'C:\python_apps'
fp = fldr+'\\IMG_0377.jpg'
rz = 0 # 2 will halve the size of the image, etc..
# ----------------
im = Image.open(fp)
if rz:
w,h = im.size
im = im.resize((w/rz,h/rz))
w,h = im.size
h = im.histogram()
rgb = r0,g0,b0 = [b.index(max(b)) for b in [ h[i*256:(i+1)*256] for i in range(3) ]]
def isqrt(n):
xn = 1
xn1 = (xn + n/xn)/2
while abs(xn1 - xn) > 1:
xn = xn1
xn1 = (xn + n/xn)/2
while xn1*xn1 > n:
xn1 -= 1
return xn1
vsqrt = numpy.vectorize(isqrt)
def dist(image):
imarr = numpy.asarray(image, dtype=numpy.int32) # dtype=numpy.int8
d = (imarr[:,:,0]-r0)**2 + (imarr[:,:,1]-g0)**2 + (imarr[:,:,2]-b0)**2
d = numpy.asarray((vsqrt(d)).clip(0,255), dtype=numpy.uint8)
return Image.fromarray(d,'L')
im.putalpha(dist(im))
im.save(fldr+'\\test.png')
I know the question is old, but I was playing around with ImageMagick trying to do something similar, and came up with this:
convert text.jpg -fill white -fuzz 50% +opaque black out.jpg
which converts this:
into this:
As regards the "average" colour, I used this:
convert text.jpg -colors 2 -colorspace RGB -format %c histogram:info:-
5894: ( 50, 49, 19) #323113 rgb(50,49,19)
19162: (186,187, 87) #BABB57 rgb(186,187,87) <- THIS ONE !
which is this colour:
After some more experimentation, I can get this:
using this:
convert text.jpg -fill black -fuzz 50% -opaque rgb\(50,50,10\) -fill white +opaque black out.jpg