I am an undergraduate student. I am new to image processing and python.
I have many images of plants samples and their description(called labels which are stuck on the sample) as shown in the below Figure. I need to Automatically segment only those labels from the sample.
I tried thresholding based on colour, but it failed. Could you please suggest me an example to do this task. I need some ideas or codes to make it completely automatic segmentation.
Please help me if you are experts in image processing and Python, I need your help to complete this task.
The rectangle is detected on the Top Left, but it should be on bottom right. Could you please tell me where is my mistake and how to correct it.
I have also given the code below.
You can try a template matching with a big white rectangle to identify the area where information is stored.
http://docs.opencv.org/3.1.0/d4/dc6/tutorial_py_template_matching.html#gsc.tab=0
When it will be done, you will be able to recognize characters in this area... You save a small subimage, and with a tool like pytesseract you will be able to read characters.
https://pypi.python.org/pypi/pytesseract
You have other OCR here with some examples :
https://saxenarajat99.wordpress.com/2014/10/04/optical-character-recognition-in-python/
Good luck !
Why using color threshold? I tried this one with ImageJ and get nice results. I just converted the image to 8bit and binarise using a fixed threshold (166 in this case). You can choose the best threshold from the image histogram.
Then you just need to find your white rectangle region and read the characters like FrsECM suggested.
Here's an example in c++:
#include "opencv2/imgproc.hpp"
#include "opencv2/highgui.hpp"
#include <stdlib.h>
#include <stdio.h>
using namespace cv;
/// Global variables
int threshold_nvalue = 166;
const int thresh_increment = 2;
int threshold_type = THRESH_BINARY;//1
int const max_value = 255;
int const morph_size = 3;
int const min_blob_size = 1000;
Mat src, src_resized, src_gray, src_thresh, src_morph;
/**
* #function main
*/
int main(int argc, char** argv)
{
/// Load an image
src = imread("C:\\Users\\phili\\Pictures\\blatt.jpg", 1);
//Resize for displaying it properly
resize(src, src_resized, Size(600, 968));
/// Convert the image to Gray
cvtColor(src_resized, src_gray, COLOR_RGB2GRAY);
/// Region of interest
Rect label_rect;
//Binarization sing fixed threshold
threshold(src_gray,src_thresh, thres, max_value, threshold_type);
//Erase small object using morphologie
Mat element = getStructuringElement(0, Size(2 * morph_size + 1, 2 * morph_size + 1), Point(morph_size, morph_size));
morphologyEx(src_thresh, src_morph, MORPH_CLOSE, element);
//find white objects and their contours
std::vector<std::vector<Point> > contours;
std::vector<Vec4i> hierarchy;
findContours(src_morph, contours, CV_RETR_TREE, CV_CHAIN_APPROX_NONE, Point(0, 0));
for (std::vector<std::vector<Point> >::iterator it = contours.begin(); it != contours.end(); ++it)
{
//just big blobs
if (it->size()>min_blob_size)
{
//approx contour and check for rectangle
std::vector<Point> approx;
approxPolyDP(*it, approx, 0.01*arcLength(*it, true), true);
if (approx.size() == 4)
{
//just for visualization
drawContours(src_resized, approx, 0, Scalar(0, 255, 255),-1);
//bounding rect for ROI
label_rect = boundingRect(approx);
//exit loop
break;
}
}
}
//Region of interest
Mat label_roi = src_resized(label_rect);
//OCR comes here...
}
Related
I am trying to translate the green screen sample (https://github.com/microsoft/Azure-Kinect-Sensor-SDK/blob/develop/examples/green_screen/main.cpp) from c++ to python, however, there is one part I cannot figure out.
This is the C++ code from the sample:
k4a::image main_color_image = captures[0].get_color_image();
k4a::image main_depth_image = captures[0].get_depth_image();
// let's green screen out things that are far away.
// first: let's get the main depth image into the color camera space
k4a::image main_depth_in_main_color = create_depth_image_like(main_color_image);
main_depth_to_main_color.depth_image_to_color_camera(main_depth_image, &main_depth_in_main_color);
cv::Mat cv_main_depth_in_main_color = depth_to_opencv(main_depth_in_main_color);
cv::Mat cv_main_color_image = color_to_opencv(main_color_image);
// single-camera case
cv::Mat within_threshold_range = (cv_main_depth_in_main_color != 0) &
(cv_main_depth_in_main_color < depth_threshold);
// show the close details
cv_main_color_image.copyTo(output_image, within_threshold_range);
// hide the rest with the background image
background_image.copyTo(output_image, ~within_threshold_range);
cv::imshow("Green Screen", output_image);
In Python, using pyk4a, I have translated it to:
capture = k4a.get_capture()
if(np.any(capture.depth) and np.any(capture.color)):
color = capture.color
depth_in_color_camera_space = capture.transformed_depth
depth_in_color_camera_space = cv2.normalize(depth_in_color_camera_space, None, 0, 255, cv2.NORM_MINMAX, dtype=cv2.CV_8U)
However, now I am stuck on creating the mask. I have tried multiple things with cv copyTo , threshold and bitwise_and, but none worked for me.
To help understand the C++ and python code better:
create_depth_image_like():
This is a method provided by the sample that creates a k4a::image. This is not needed for python because it just provides an array.
static k4a::image create_depth_image_like(const k4a::image &im)
{
return k4a::image::create(K4A_IMAGE_FORMAT_DEPTH16,
im.get_width_pixels(),
im.get_height_pixels(),
im.get_width_pixels() * static_cast<int>(sizeof(uint16_t)));
}
depth_image_to_color_camera():
This method is provided by the Azure Kinect SDK and is available in python as capture.transformed_depth. It transforms the depth image to be in the same image space as the color image or vice versa.
depth_to_opencv():
I am not sure if I need to use this method. Also, I don't really know what it does to be honest.
static cv::Mat depth_to_opencv(const k4a::image &im)
{
return cv::Mat(im.get_height_pixels(),
im.get_width_pixels(),
CV_16U,
(void *)im.get_buffer(),
static_cast<size_t>(im.get_stride_bytes()));
}
color_to_opencv():
Same goes for this one. I think I don't need them because python returns an array already, however, I am not sure.
static cv::Mat color_to_opencv(const k4a::image &im)
{
cv::Mat cv_image_with_alpha(im.get_height_pixels(), im.get_width_pixels(), CV_8UC4, (void *)im.get_buffer());
cv::Mat cv_image_no_alpha;
cv::cvtColor(cv_image_with_alpha, cv_image_no_alpha, cv::COLOR_BGRA2BGR);
return cv_image_no_alpha;
}
So, what's left for me to translate to python is the mask (defined in the C++ code as within_threshold_range. However, this is the part I just can't figure out. Any help would be greatly appreciated!
I want to detect small squares circled in red on the image. But the problem is that they are on another white line. I want to know how to separate those squares from the white line and detect them.
I have used OpenCV Python to write the code. What I have done until now is that I cropped the image so that I get access only to the circular part of the image. Then I cropped the image to get the required part that is the white line. Then I used erosion so that the white line will vanish and the squares remain in the image. Then used Hough circles to detect the squares. This does work for some images but it cannot be generalized. Please help me in finding a generalized code for this. Let me know the logic and also the python code.
Also could anyone help me detect that aruco marker on the image. Its getting rejected. I dont know why.
Image is in this link. Detect small squares on an image
here's C++ code with distanceTransform.
Since I nearly only used openCV functions, you can probably easily convert it to Python code.
I removed the white stripe at the top of the image manually, hope this isn't a problem.
int main()
{
cv::Mat input = cv::imread("C:/StackOverflow/Input/SQUARES.png", cv::IMREAD_GRAYSCALE);
cv::Mat thres = input > 0; // make binary mas
cv::Mat dst;
cv::distanceTransform(thres, dst, CV_DIST_L2, 3);
double min, max;
cv::Point minPt, maxPt;
cv::minMaxLoc(dst, &min, &max, 0, 0);
double distThres = max*0.65; // a real clustering would be better. This assumes that the white circle thickness is a bout 50% of the square size, so 65% should be ok...
cv::Mat squaresMask = dst >= distThres;
cv::imwrite("C:/StackOverflow/Input/SQUARES_mask.png", squaresMask);
std::vector<std::vector<cv::Point> > contours;
cv::findContours(squaresMask, contours, cv::RETR_EXTERNAL, cv::CHAIN_APPROX_NONE);
cv::Mat output;
cv::cvtColor(input, output, cv::COLOR_GRAY2BGR);
for (int i = 0; i < contours.size(); ++i)
{
cv::Point2f center;
float radius;
cv::minEnclosingCircle(contours[i], center, radius);
cv::circle(output, center, 5, cv::Scalar(255, 0, 255), -1);
//cv::circle(output, center, radius, cv::Scalar(255, 0, 255), 1);
}
cv::imwrite("C:/StackOverflow/Input/SQUARES_output.png", output);
cv::imshow("output", output);
cv::waitKey(0);
}
this is the input:
this it the squaresMask after distance transform
and this is the result
I need to draw "soft" white circles (translucent borders) onto an image with OpenCV, but all I can find in the docs is how to draw 100% opaque circles with hard borders. Does anyone know how I could do this, or at least create the illusion that the circles "fade out" at the edges?
I felt like working on my OpenCV skills a bit - and learned quite a lot - cool question!
I generated a single channel image of alpha values - float to get fewer rounding errors, and single channel to save some memory. This represents how much of your circle is visible over the background.
The circle has an outer radius - the point at which it becomes fully transparent and an inner radius, the point where it stops being fully opaque. Radii between these two will be faded. So, set the IRADIUS very close to the ORADIUS for a steep, rapid falloff and set it a long way away for a slower tapering out.
I used an ROI to position the circle on the background and to speed things up by only iterating over the necessary rectangle of the background.
The only tricky part is alpha blending or compositing. You just have to know the formula for each pixel in the output image is:
out = (alpha * foreground) + (1-alpha) * background
Here is the code. I am not the world's best at OpenCV so there may be parts that can be optimised!
////////////////////////////////////////////////////////////////////////////////
// main.cpp
// Mark Setchell
////////////////////////////////////////////////////////////////////////////////
#include <opencv2/opencv.hpp>
#include <vector>
#include <cstdlib>
using namespace std;
using namespace cv;
#define ORADIUS 100 // Outer radius
#define IRADIUS 80 // Inner radius
int main()
{
// Create a blue background image
Mat3b background(400,600,Vec3b(255,0,0));
// Create alpha layer for our circle normalised to 1=>solid, 0=>transparent
Mat alpha(2*ORADIUS,2*ORADIUS,CV_32FC1);
// Now draw a circle in the alpha channel
for(auto r=0;r<alpha.rows;r++){
for(auto c=0;c<alpha.cols;c++){
int x=ORADIUS-r;
int y=ORADIUS-c;
float radius=hypot((float)x,(float)y);
auto& pixel = alpha.at<float>(r,c);
if(radius>ORADIUS){ pixel=0.0; continue;} // transparent
if(radius<IRADIUS){ pixel=1.0; continue;} // solid
pixel=1-((radius-IRADIUS)/(ORADIUS-IRADIUS)); // partial
}
}
// Create solid magenta rectangle for circle
Mat3b circle(2*ORADIUS,2*ORADIUS,Vec3b(255,0,255));
#define XPOS 20
#define YPOS 120
// Make an ROI on background where we are going to place circle
Rect ROIRect(XPOS,YPOS,ORADIUS*2,ORADIUS*2);
Mat ROI(background,ROIRect);
// Do the alpha blending thing
Vec3b *thisBgRow;
Vec3b *thisFgRow;
float *thisAlphaRow;
for(int j=0;j<ROI.rows;++j)
{
thisBgRow = ROI.ptr<Vec3b>(j);
thisFgRow = circle.ptr<Vec3b>(j);
thisAlphaRow = alpha.ptr<float>(j);
for(int i=0;i<ROI.cols;++i)
{
for(int c=0;c<3;c++){ // iterate over channels, result=circle*alpha + (1-alpha)*background
thisBgRow[i][c] = saturate_cast<uchar>((thisFgRow[i][c]*thisAlphaRow[i]) + ((1.0-thisAlphaRow[i])*thisBgRow[i][c]));
}
}
}
imwrite("result.png",background);
return 0;
}
This is with IRADIUS=80:
This is with IRADIUS=30:
Kudos and thanks to #Micka for sharing his code for iterating over a ROI here.
Oooops, I just realised you were looking for a Python solution. Hopefully my code will give you some ideas for generating the soft circle mask, and I found an article here that shows you some Python-style ways of doing it that you can mash up with my code.
I've done extensive research and cannot find a combination of techniques that will achieve what I need.
I have a situation where I need to perform OCR on hundreds of W2s to extract the data for a reconciliation. The W2s are very poor quality, as they are printed and subsequently scanned back into the computer. The aforementioned process is outside of my control; unfortunately I have to work with what I've got.
I was able to successfully perform this process last year, but I had to brute force it as timeliness was a major concern. I did so by manually indicating the coordinates to extract the data from, then performing the OCR only on those segments one at a time. This year, I would like to come up with a more dynamic situation in the anticipation that the coordinates could change, format could change, etc.
I have included a sample, scrubbed W2 below. The idea is for each box on the W2 to be its own rectangle, and extract the data by iterating through all of the rectangles. I have tried several edge detection techniques but none have delivered exactly what is needed. I believe that I have not found the correct combination of pre-processing required. I have tried to mirror some of the Sudoku puzzle detection scripts.
Here is the result of what I have tried thus far, along with the python code, which can be used whether with OpenCV 2 or 3:
import cv2
import numpy as np
img = cv2.imread(image_path_here)
newx,newy = img.shape[1]/2,img.shape[0]/2
img = cv2.resize(img,(newx,newy))
blur = cv2.GaussianBlur(img, (3,3),5)
ret,thresh1 = cv2.threshold(blur,225,255,cv2.THRESH_BINARY)
gray = cv2.cvtColor(thresh1,cv2.COLOR_BGR2GRAY)
edges = cv2.Canny(gray,50,220,apertureSize = 3)
minLineLength = 20
maxLineGap = 50
lines = cv2.HoughLinesP(edges,1,np.pi/180,100,minLineLength,maxLineGap)
for x1,y1,x2,y2 in lines[0]:
cv2.line(img,(x1,y1),(x2,y2),(255,0,255),2)
cv2.imshow('hough',img)
cv2.waitKey(0)
He he, edge detection is not the only way. As the edges are thick enough (at least one pixel everywhere), binarization allows you to singulate the regions inside the boxes.
By simple criteria you can get rid of clutter, and just bounding boxes give you a fairly good segmentation.
Let me know if you don't follow anything in my code. The biggest faults of this concept are
1: (if you have noisy breaks in the main box line that would break it into separate blobs)
2: idk if this is a thing where there can be handwritten text, but having letters overlap the edges of boxes could be bad.
3: It does absolutely no orientation checking, (you may actually want to improve this as I don't think it would be too bad and would give you more accurate handles). What I mean is that it depends on your boxes being approximately aligned to the xy axes, if they are sufficiently skew, it will give you gross offsets to all your box corners (though it should still find them all)
I fiddled with the threshold set point a bit to get all the text separated from the edges, you could probably pull it even lower if necessary before you start breaking the main line. Also, if you are worried about line breaks, you could add together sufficiently large blobs into the final image.
Basically, first step fiddling with the threshold to get it in the most stable (likely lowest value that still keeps a connected box) cuttoff value for separating text and noise from box.
Second find the biggest positive blob (should be the boxgrid). If your box doesnt stay all together, you may want to take a few of the highest blobs... though that will get sticky, so try to get the threshold so that you can get it as a single blob.
Last step is to get the rectangles, to do this, I just look for negative blobs (ignoring the first outer area).
And here is the code (sorry that it is in C++, but hopefully you understand the concept and would write it yourself anyhow):
#include "opencv2/imgproc/imgproc.hpp"
#include "opencv2/highgui/highgui.hpp"
#include <iostream>
#include <stdio.h>
#include <opencv2/opencv.hpp>
using namespace cv;
//Attempts to find the largest connected group of points (assumed to be the interconnected boundaries of the textbox grid)
Mat biggestComponent(Mat targetImage, int connectivity=8)
{
Mat inputImage;
inputImage = targetImage.clone();
Mat finalImage;// = inputImage;
int greatestBlobSize=0;
std::cout<<"Top"<<std::endl;
std::cout<<inputImage.rows<<std::endl;
std::cout<<inputImage.cols<<std::endl;
for(int i=0;i<inputImage.cols;i++)
{
for(int ii=0;ii<inputImage.rows;ii++)
{
if(inputImage.at<uchar>(ii,i)!=0)
{
Mat lastImage;
lastImage = inputImage.clone();
Rect* boundbox;
int blobSize = floodFill(inputImage, cv::Point(i,ii), Scalar(0),boundbox,Scalar(200),Scalar(255),connectivity);
if(greatestBlobSize<blobSize)
{
greatestBlobSize=blobSize;
std::cout<<blobSize<<std::endl;
Mat tempDif = lastImage-inputImage;
finalImage = tempDif.clone();
}
//std::cout<<"Loop"<<std::endl;
}
}
}
return finalImage;
}
//Takes an image that only has outlines of boxes and gets handles for each textbox.
//Returns a vector of points which represent the top left corners of the text boxes.
std::vector<Rect> boxCorners(Mat processedImage, int connectivity=4)
{
std::vector<Rect> boxHandles;
Mat inputImage;
bool outerRegionFlag=true;
inputImage = processedImage.clone();
std::cout<<inputImage.rows<<std::endl;
std::cout<<inputImage.cols<<std::endl;
for(int i=0;i<inputImage.cols;i++)
{
for(int ii=0;ii<inputImage.rows;ii++)
{
if(inputImage.at<uchar>(ii,i)==0)
{
Mat lastImage;
lastImage = inputImage.clone();
Rect boundBox;
if(outerRegionFlag) //This is to floodfill the outer zone of the page
{
outerRegionFlag=false;
floodFill(inputImage, cv::Point(i,ii), Scalar(255),&boundBox,Scalar(0),Scalar(50),connectivity);
}
else
{
floodFill(inputImage, cv::Point(i,ii), Scalar(255),&boundBox,Scalar(0),Scalar(50),connectivity);
boxHandles.push_back(boundBox);
}
}
}
}
return boxHandles;
}
Mat drawTestBoxes(Mat originalImage, std::vector<Rect> boxes)
{
Mat outImage;
outImage = originalImage.clone();
outImage = outImage*0; //really I am just being lazy, this should just be initialized with dimensions
for(int i=0;i<boxes.size();i++)
{
rectangle(outImage,boxes[i],Scalar(255));
}
return outImage;
}
int main() {
Mat image;
Mat thresholded;
Mat processed;
image = imread( "Images/W2.png", 1 );
Mat channel[3];
split(image, channel);
threshold(channel[0],thresholded,150,255,1);
std::cout<<"Coputing biggest object"<<std::endl;
processed = biggestComponent(thresholded);
std::vector<Rect> textBoxes = boxCorners(processed);
Mat finalBoxes = drawTestBoxes(image,textBoxes);
namedWindow("Original", WINDOW_AUTOSIZE );
imshow("Original", channel[0]);
namedWindow("Thresholded", WINDOW_AUTOSIZE );
imshow("Thresholded", thresholded);
namedWindow("Processed", WINDOW_AUTOSIZE );
imshow("Processed", processed);
namedWindow("Boxes", WINDOW_AUTOSIZE );
imshow("Boxes", finalBoxes);
std::cout<<"waiting for user input"<<std::endl;
waitKey(0);
return 0;
}
I have an image as below :
Can anyone tell me how to detect the number of circles in it.I'm using Hough circle transform to achieve this and this is my code:
# import the necessary packages
import numpy as np
import sys
import cv2
# load the image, clone it for output, and then convert it to grayscale
image = cv2.imread(str(sys.argv[1]))
output = image.copy()
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# detect circles in the image
circles = cv2.HoughCircles(gray, cv2.cv.CV_HOUGH_GRADIENT, 1.2, 5)
no_of_circles = 0
# ensure at least some circles were found
if circles is not None:
# convert the (x, y) coordinates and radius of the circles to integers
circles = np.round(circles[0, :]).astype("int")
no_of_circles = len(circles)
# loop over the (x, y) coordinates and radius of the circles
for (x, y, r) in circles:
# draw the circle in the output image, then draw a rectangle
# corresponding to the center of the circle
cv2.circle(output, (x, y), r, (0, 255, 0), 4)
cv2.rectangle(output, (x - 5, y - 5), (x + 5, y + 5), (0, 128, 255), -1)
# show the output image
cv2.imshow("output", np.hstack([image, output]))
print 'no of circles',no_of_circles
I'm getting wrong answers for this code.Can anyone tell me where I went wrong?
i tried a tricky way to detect all circles.
i found HoughCircles parameters manually
HoughCircles( src_gray, circles, HOUGH_GRADIENT, 1, 50, 40, 46, 0, 0 );
the tricky part is
flip( src, flipped, 1 );
hconcat( src,flipped, flipped );
hconcat( flipped, src, src );
flip( src, flipped, 0 );
vconcat( src,flipped, flipped );
vconcat( flipped, src, src );
flip( src, src, -1 );
will create a model like below before detection.
the result is like this
the c++ code can be easily converted to python
#include "opencv2/highgui/highgui.hpp"
#include "opencv2/imgproc/imgproc.hpp"
#include <iostream>
using namespace std;
using namespace cv;
int main(int argc, char** argv)
{
Mat src, src_gray, flipped, display;
if (argc < 2)
{
std::cerr<<"No input image specified\n";
return -1;
}
// Read the image
src = imread( argv[1], 1 );
if( src.empty() )
{
std::cerr<<"Invalid input image\n";
return -1;
}
flip( src, flipped, 1 );
hconcat( src,flipped, flipped );
hconcat( flipped, src, src );
flip( src, flipped, 0 );
vconcat( src,flipped, flipped );
vconcat( flipped, src, src );
flip( src, src, -1 );
// Convert it to gray
cvtColor( src, src_gray, COLOR_BGR2GRAY );
// Reduce the noise so we avoid false circle detection
GaussianBlur( src_gray, src_gray, Size(9, 9), 2, 2 );
// will hold the results of the detection
std::vector<Vec3f> circles;
// runs the actual detection
HoughCircles( src_gray, circles, HOUGH_GRADIENT, 1, 50, 40, 46, 0, 0 );
// clone the colour, input image for displaying purposes
display = src.clone();
Rect rect_src(display.cols / 3, display.rows / 3, display.cols / 3, display.rows / 3 );
rectangle( display, rect_src, Scalar(255,0,0) );
for( size_t i = 0; i < circles.size(); i++ )
{
Point center(cvRound(circles[i][0]), cvRound(circles[i][1]));
int radius = cvRound(circles[i][2]);
Rect r = Rect( center.x-radius, center.y-radius, radius * 2, radius * 2 );
Rect intersection_rect = r & rect_src;
if( intersection_rect.width * intersection_rect.height > r.width * r.height / 3 )
{
// circle center
circle( display, center, 3, Scalar(0,255,0), -1, 8, 0 );
// circle outline
circle( display, center, radius, Scalar(0,0,255), 3, 8, 0 );
}
}
// shows the results
imshow( "results", display(rect_src));
// get user key
waitKey();
return 0;
}
This SO post describes detection of semi-circles, and may be a good start for you:
Detect semi-circle in opencv
If you get stuck in OpenCV, try coding up the solution yourself. Writing a Hough circle finder parameterized for your particular application is relatively straightforward. If you write application-specific Hough algorithms a few times, you should be able to write a reasonable solution in less time than it takes to sort through a bunch of google results, decipher someone else's code, and so on.
You definitely don't need Canny edge detection for an image like this, but it won't hurt.
Other libraries (esp. commercial ones) will allow you to set more parameters for Hough circle finding. I would've expected some overload of the HoughCircle function to allow a struct of search parameters to be passed in, including the minimum percentage of circle completeness (arc length) allowed.
Although it's good to learn both RANSAC and Hough techniques--and, over time, more exotic techniques--I wouldn't necessarily recommend using RANSAC when you have circles defined so nicely and crisply. Without offering specific evidence, I'll just claim that fiddling with RANSAC parameters may be less intuitive than fiddling with Hough parameters.
HoughCircles needs some parameter tuning to work properly.
It could be that in your case the default values of Param1 and Param2 (set to 100) are not good.
You can fine tune your detection with HoughCircle, by computing the ultimate eroded. It will give you the number of circles in your image.
If there are only circles and background on the input you can count the number of connected components and ignore the component associated with background. This will be the simplest and most robust solution