I've implemented CBIR app by using standard ConvNet approach:
Use Transfer Learning to extract features from the data set of images
Cluster extracted features via knn
Given search image, extract its features
Give top 10 images that are close to the image in hand in knn network
I am getting good results, but I want to further improve them by adding text search as well. For instance, when my image is the steering wheel of the car, the close results will be any circular objects that resemble a steering wheel for instance bike wheel. What would be the best possible way to input text say "car part" to produce only steering wheels similar to the search image.
I am unable to find a good way to combine ConvNet with text search model to construct improved knn network.
My other idea is to use ElasticSearch in order to do text search, something that ElasticSearch is good at. For instance, I would do a CBIR search described previously and out of the return results, I can look up their description and then use ElasticSearch on the subset of the hits to produce the results. Maybe tag images with classes and allow user to de/select groups of images of interest.
I don't want to do text search before image search as some of the images are poorly described so text search would miss them.
Any thoughts or ideas will be appreciated!
I have not found the original paper, but maybe you might find it interesting: https://www.slideshare.net/xavigiro/multimodal-deep-learning-d4l4-deep-learning-for-speech-and-language-upc-2017
It is about looking for a vector space where both images and text are (multimodal embedding). In this way you can find text similar to a images, images referring to a text, or use the tuple text / image to find similar images.
I think maybe this idea is an interesting point to start from.
Related
I want to make some kind of python script that can do that.
In my case i just want very simple unwarping as follow
Always having similar backgroud
Always Placing page at similar position
Always having same type of wraped image
I tried following methods but didn't work out.
I tried so many scanning apps but no app can unwarp 3d wrap
for example this one microsoft office lens
.
I tried page_dewarp.py. But it does not work with pages having spaces between texts or having segments of texts and most of times for that kind of images it just revert cure from left to right or vice versa and also unable to detect actual text area for example
I found deep-learning-for-document-dewarping that is trying to solve this problem by using pix2pixHD But i am not sure this is gona work and this project don't have trained models and currectly not solving the problem. should i train a model with just following training data train_A - warped input images and train_B - unwarped output images as mentioned at pix2pixHD. I can generate training data by make warped and uwarped images using blender 3d. In this way i can generate so many images by using some scanned book's pages by just rendering uwarped image and warped image it like someone taking photos of pages but virtually.
I am trying to detect and extract the "labels" and "dimensions" of a 2D technical drawing which is being saved as PDF using python. I came across a python library call "pytesseract" which has optical character recognition capability. I tried the demo on my image but it fails to detect most of the label/dimensions. Please suggest if there is other way to do it. Thank you**.
** Attached is a sample of the 2D technical drawing I try to detect
** what I am trying to achieve is to able to obtain the coordinate of every dimensions (the 160,120,10 4x45 etc) on the image, and extract the, as well.
About 16 months ago we asked ourselves the same question.
If you want to implement it yourself, I'd suggest the following process:
Extract the Canvas from the sheet
Separate the Cuts
Detect the Measure Regions on each Cut
Detect the individual attributes of the Measure Regions to understand where the Measure Start & End. In your particular example that's relatively easy.
Run the detected Measure Labels through OCR
Associate the Labels to the Measures
Verify your results
Alternatively you can also run it through our API and get the results as JSON.
Here's a quick visualization of the result:
Drawing Read (GT stands for General Tolerances)
I am about to start learning CV and ML. I want to start by solving a problem. Below I am sharing an image and I want to extract each symbol and location from an image and create a new image with those extracted symbols in a pattern just like in source image. After that, I will do a translation job. Right now how can I or which steps I should follow to extract the symbols and find those symbols from the dataset (in terms of Gardiner's sign list) and place in the new image?
I know there is some computer vision + machine learning is involved in this process because symbols are not 100% accurate because these are too old symbols. I don't know from where to start and end. I have plans to use Python. Also, share if you know anyone already done this. Thank you.
Run sobel edge detection on source images in Gardiner's sign list.
Train a CNN on the list.
Normalize the contrast of the source image.
Run sobel edge detection on the source image. (referred to as source image heretofore)
Evaluate in the CNN by varying heights and widths(from the largest to smallest) on the source image.
Select the highest probability glyph. Output the corresponding Glyph from Gardiner's list at that start position and the corresponding height and width.
I do not claim this can be done in six simple steps, but this is the approach I would take.
I am new to the image processing subject. I'm using opencv library for image processing with python. I need to extract symbols and texts related to those symbols for further work. I saw some of developers have done handwritten text recognitions with Neural network, KNN and other techniques.
My question is what is the best way to extract these symbols and handwritten texts related to them?
Example diagram:
Details I need to extract:
No of Circles in the diagram.
What are the texts inside them.
What are the words within square brackets.
Are they connected with arrows or not.
Of course, there is a method called SWT - Stokes Width Transform.
Please see this paper, if you search it by its name, you can find the codes that some students have written during their school project.
By using this method, text recognitions can be applied. But it is not a days job.
Site: Detecting Text in Natural Scenes with
Stroke Width Transform
Hope that it helps.
For handwritten text recognition, try using TensorFlow. Their website has a simple example for digit recognition (with training data). You can use it to implement your own application for recognizing handwritten alphabets as well. (You'll need to get training data for this though; I used a training data set provided by NIST.)
If you are using OpenCV with python, Hough transform can detect circles in images. You might miss some hand drawn circles, but there are ways to detect ovals and other closed shapes.
For handwritten character recognition, there are lots of libraries available.
Since you are now to this area, I strongly recommend LearnOpenCV and and PyImageSearch to help you familiarize with the algorithms that are available for this kind of tasks.
I have some kinds of paper sheet and I am writing python script with opencv to recognize the same paper sheet to classify. I am stuck in how to find the same kind of paper sheet. For example, I attached two pic. Picture 1 is the template and picture 2 is some kind of paper I need to know if it is matching with the template. I don't need to match the text and I just need to match the form. I need to classify the same sheet in many of paper sheet.
I have adjust the skew of paper and detect some lines but I don't know how to match the lines and judge this paper sheet is the same kind with the template.
Is there any one can give me an advice for the matching algorithm?
I'm not sure if such paper form is rich enough in visual information for this solution, but I think you should start with feature detection and homography calculation (opencv tutorial: Features2D + Homography). From there you can try to adjust 2D features for your problem.
Check out the findContours and MatchShape function. Either way, you are much better off matching a specific visual ID within the form that is representative of the form. Like a really simple form of barcode.