How to extract symbols from Image? - python

I am about to start learning CV and ML. I want to start by solving a problem. Below I am sharing an image and I want to extract each symbol and location from an image and create a new image with those extracted symbols in a pattern just like in source image. After that, I will do a translation job. Right now how can I or which steps I should follow to extract the symbols and find those symbols from the dataset (in terms of Gardiner's sign list) and place in the new image?
I know there is some computer vision + machine learning is involved in this process because symbols are not 100% accurate because these are too old symbols. I don't know from where to start and end. I have plans to use Python. Also, share if you know anyone already done this. Thank you.

Run sobel edge detection on source images in Gardiner's sign list.
Train a CNN on the list.
Normalize the contrast of the source image.
Run sobel edge detection on the source image. (referred to as source image heretofore)
Evaluate in the CNN by varying heights and widths(from the largest to smallest) on the source image.
Select the highest probability glyph. Output the corresponding Glyph from Gardiner's list at that start position and the corresponding height and width.
I do not claim this can be done in six simple steps, but this is the approach I would take.

Related

yolo v8: does segment contain point?

I'm using yolo v8 to detect subjects in pictures. It's working well, and can create quite precise masks over subjects.
from ultralytics import YOLO
model = YOLO('yolov8x-seg.pt')
for output in model('image.jpg', return_outputs=True):
for segment in output['segment']:
print(segment)
The code above works, and generates a series of "segments", which are a list of points that define the shape of subjects on my image. That shape is not convex (for example horses).
I need to figure out if a random coordinate on the image falls within these segments, and I'm not sure how to do it.
My first approach was to build an image mask using PIL. That roughly worked, but it doesn't always work, depending on the shape of the segments. I also thought about using shapely, but it has restrictions on the Polygon classes, which I think will be a problem in some cases.
In any case, this really feels like a problem that could easily be solved with the tools I'm already using (yolo, pytorch, numpy...), but to be honest I'm too new to all this to figure out how to do it properly.
Any suggestion is appreciated :)
You should be able to get a segmentation mask from your model: imagine a binary image where black (zeros) represents the background and white (or other non zero values) represent an instance of a segmentation class.
Once you have the binary image you can use opencv's findContours function to get a the largest outer path.
Once you have that path you can use pointPolygonTest() to check if a point is inside that contour or not.

Python:How can I find the coordinates of a symbol with number inside (technical drawing)

I am writing a python tool to find specific symbols (e.g. a circle/square with a number inside) on a drawing pdf/screenshot.png
I know from another data source the specific number(s) that should be inside the circle/square.
Using opencv matchTemplate I can find symbols and its coordinates.
One way would be to created all possible symbols (so circles/squares with number 1 to 1000) and save them. Then use opencv to find it on the drawing since I know the number to be found, and thus the filled symbol.
I am sure that the is a smart way to do this. Can somebody guide me into the right direction.
Note: pdfminer will not work since I will not be able to distinguish between measurement numbers and the text coming from the symbol, but I could be wrong here.
I am also trying to solve a similar problem in a coding assignment. The input is a n low poly art illustration.
Once you find the location of the UFO's, you need to crop that part and pass it through a classifier to find the number that UFO contains. The classifier is trained on 5000 images.
I am now going to try the matchTemplate method suggested by you to find the co-ordinates of the UFOs.

Homography of soccer field

Okay so i am trying to find homography of a soccer match. What i have till now is
Read images from a folder which is basically many cropped images of a template soccer field. Basically this has images for center circle and penalty lines etc.
Read video stream from a file and crop it into many smaller segments.
Loop inside the images in video stream and inside that another loop for images that i read from folder.
Now in the two images that i get through iteration , i applied a green filter because of my assumption that field is green
Use orb to find points and then find matches.
Now the Problem is that because of players and some noise from croud, i am unable to find proper matches for homography. Also removing them is a problem because that also tends to hide the soccer field lines that i need to calculate the homography on.
Any suggestions on this is greatly appreciated. Also below are some sample code and images that i am using.
"Code being used"
Sample images
Output that i am getting
The image on right of output is a frame from video and that on left is the same sample image that i uploaded after filterGreen function as can be seen from the code.
Finally what i want is for the image to properly map to center circle so i can draw a cube in center, Somewhat similar to "This example" . Thanks in advance for helping me out.
An interesting technique to throw at this problem is RASL. It computes homographies that align stacks of related images. It does not require that you specify corresponding points on the images, but operates directly on the image pixels. It is robust against image occlusions (eg, players moving in the foreground).
I've just released a Python implementation here: https://github.com/welch/rasl
(there are also links there to the original RASL paper, MATLAB implementation, and data).
I am unsure if you'd want to crop the input images to that center circle, or if the entire frames can be aligned. Try both and see.

OpenCV: Compute confidence in image matching

i write an app where the user takes the picture of a logo and the app tries to find the right logo in its database. Therefore i use the cv2.SIFT() algorithm and basically a modified version of the find_obj.py example (https://code.ros.org/trac/opencv/browser/trunk/opencv/samples/python2/find_obj.py?rev=6080).
The script now gives me values for "matches" and "inliers" and I have trouble finding out how to compute the logo that fits best. First i just took the image with most matches or inliners but it was often the wrong choice. Then I used inliers/matches as a confidence value but it could well be that one logo has a value of 4/5 (0.8) and the right one has 16/24 (0.66). So I tried to weight the number of matches like this inliers/matches*matches*0.3 but of course i have no clue how to really weight it.
Any advice what to do?!
You could try taking each match and computing the alignment between your image and the reference image. Then warp your image to align with the reference image and compute the error in that space.

OCR of low-resolution text from screenshots

I'm writing an OCR application to read characters from a screenshot image. Currently, I'm focusing only on digits. I'm partially basing my approach on this blog post: http://blog.damiles.com/2008/11/basic-ocr-in-opencv/.
I can successfully extract each individual character using some clever thresholding. Where things get a bit tricky is matching the characters. Even with fixed font face and size, there are some variables such as background color and kerning that cause the same digit to appear in slightly different shapes. For example, the below image is segmented into 3 parts:
Top: a target digit that I successfully extracted from a screenshot
Middle: the template: a digit from my training set
Bottom: the error (absolute difference) between the top and middle images
The parts have all been scaled (the distance between the two green horizontal lines represents one pixel).
You can see that despite both the top and middle images clearly representing a 2, the error between them is quite high. This causes false positives when matching other digits -- for example, it's not hard to see how a well-placed 7 can match the target digit in the image above better than the middle image can.
Currently, I'm handling this by having a heap of training images for each digit, and matching the target digit against those images, one-by-one. I tried taking the average image of the training set, but that doesn't resolve the problem (false positives on other digits).
I'm a bit reluctant to perform matching using a shifted template (it'd be essentially the same as what I'm doing now). Is there a better way to compare the two images than simple absolute difference? I was thinking of maybe something like the EMD (earth movers distance, http://en.wikipedia.org/wiki/Earth_mover's_distance) in 2D: basically, I need a comparison method that isn't as sensitive to global shifting and small local changes (pixels next to a white pixel becoming white, or pixels next to a black pixel becoming black), but is sensitive to global changes (black pixels that are nowhere near white pixels become black, and vice versa).
Can anybody suggest a more effective matching method than absolute difference?
I'm doing all this in OpenCV using the C-style Python wrappers (import cv).
I would look into using Haar cascades. I've used them for face detection/head tracking, and it seems like you could build up a pretty good set of cascades with enough '2's, '3's, '4's, and so on.
http://alereimondo.no-ip.org/OpenCV/34
http://en.wikipedia.org/wiki/Haar-like_features
OCR on noisy images is not easy - so simple approaches no not work well.
So, I would recommend you to use HOG to extract features and SVM to classify. HOG seems to be one of the most powerful ways to describe shapes.
The whole processing pipeline is implemented in OpenCV, however I do not know the function names in python wrappers. You should be able to train with the latest haartraining.cpp - it actually supports more than haar - HOG and LBP also.
And I think the latest code (from trunk) is much improved over the official release (2.3.1).
HOG usually needs just a fraction of the training data used by other recognition methods, however, if you want to classify shapes that are partially ocludded (or missing), you should make sure you include some such shapes in training.
I can tell you from my experience and from reading several papers on character classification, that a good way to start is by reading about Principal Component Analysis (PCA), Fisher's Linear Discriminant Analysis (LDA), and Support Vector Machines (SVMs). These are classification methods that are extremely useful for OCR, and it turns out that OpenCV already includes excellent implementations on PCAs and SVMs. I haven't seen any OpenCV code examples for OCR, but you can use some modified version of face classification to perform character classification. An excellent resource for face recognition code for OpenCV is this website.
Another library for Python that I recommend you is "scikits.learn". It is very easy to send cvArrays to scikits.learn and run machine learning algorithms on your data. A basic example for OCR using SVM is here.
Another more complicated example using manifold learning for handwritten character recognition is here.

Categories

Resources