Im trying to see the output after embedding an image. I tried using an opencv image function and ask to print the result, but it is not working. Any suggestions on how to test embedding using dlib?
Step1: Get the position of each image in the embedding space.
Step2: Visualize it. May by the dimension is higher than 2D/3D you can use some methods like t-SNE to do the visualization.
Related
I have images that are 4928x3280 and I'd like to crop them into tiles of 640x640 with a certain percentage of overlap. The issue is that I have no idea how to deal with the bounding boxes of these files in my dataset as I've found this paper,(http://openaccess.thecvf.com/content_CVPRW_2019/papers/UAVision/Unel_The_Power_of_Tiling_for_Small_Object_Detection_CVPRW_2019_paper.pdf), but not code or so referring to how they did this. There are some examples on the internet that actually have the yoloV5 tiling but without overlap like this(https://github.com/slanj/yolo-tiling) one.
Does anyone know how I could make this myself or if someone has an example of this for me?
If you want a ready to go library to make possible tiling and inference for yolov5, there is SAHI:
<https://github.com/obss/sahi
You can use it to create tiles with related annotations, to make inferences and evaluate model performance.
I'm trying to automatically draw a mesh or grid over a face, similar to the image below, to use the result in a blog post that I'm writing. However, my knowledge of computer vision is not enough to recognize which model or algorithm is behind these types of cool visualizations.
Could someone help pointing me some link to reador or a starting point?
Using Python, OpenCV and dlib the closest thing I found is something called delauny triangulation but I'm not sure if that's exactly what I'm looking for seeing the results.
Putting it in a few words what I have so far is:
Detect all faces on image and calculate their landmarks using dlib.get_frontal_face_detector() and dlib.shape_predictor() methods from dlib.
Use the method cv2.Subdiv2D() from OpenCV to compute a 2D subdivision based on my landmarks. In particulary I'm getting the delauny subdivision using the getTriangleList() method from the resulting subdivision.
The complete code is available here.
However, the result is not so attractive perhaps because the division is using triangles instead of polygons and I want to check if I can improve it!
I have a large image from which I need to extract some information. I am using Python opencv library for image enhancement. Using opencv methods I extracted a part of the image that interests me. It is given below.
Tesseract is not able to distinguish between 0 and O. It is giving output as all zeroes. I tried with --psm options 6 and others, but to no avail. I am using the latest stable release of tesseract (v3) on windows.
I am continuing to work on this problem. Any help would be appreciated. Thanks.
EDIT:
I found a solution for this. Used box output from tesseract. Need to give makebox as an argument to tesseract command. The box output contains the X and Y coordinates of a 'box' around each character read. I derived the ratio of width to height and with some input images trained a Logistic Regression Model to predict the output a 0 or O. Then I used this trained model for test images and it worked like a charm.
I am trying to train my own image classificator with py-faster-rcnn link using my own images.
It looks rather simple in the example here, but they are using some ready dataset (INRIA Person). Datasets are structured and cropped to sub-images (actually data sets there are both original images and cropped people images from them) and text notation of each image with coordinates of crops. Pretty straightforward.
Still I have no idea how this is done - do they use any sort of tool for this (I can hardly imagine some test lots of data are cropped and notated manually)?
Could anyone please suggest a solution for this one? Thanks.
I'm kinda new both to OCR recognition and Python.
What I'm trying to achieve is to run Tesseract from a Python script to 'recognize' some particular figures in a .tif.
I thought I could do some training for Tesseract but I didn't find any similar topic on Google and here at SO.
Basically I have some .tif that contains several images (like an 'arrow', a 'flower' and other icons), and I want the script to print as output the name of that icon. If it finds an arrow then print 'arrow'.
Is it feasible?
This is by no means a complete answer, but if there are multiple images in the tif and if you know the size in advance, you can standardize the image samples prior to classifying them. You would cut up the image into all the possible rectangles in the tif.
So when you create a classifier (I don't mention the methods here), the end result would take a synthesis of classifying all of the smaller rectangles.
So if given a tif , the 'arrow' or 'flower' images are 16px by 16px , say, you can use
Python PIL to create the samples.
from PIL import Image
image_samples = []
im = Image.open("input.tif")
sample_dimensions = (16,16)
for box in get_all_corner_combinations(im, sample_dimensions):
image_samples.append(im.crop(box))
classifier = YourClassifier()
classifications = []
for sample in image_samples:
classifications.append (classifier (sample))
label = fuse_classifications (classifications)
Again, I didn't talk about the learning step of actually writing YourClassifier. But hopefully this helps with laying out part of the problem.
There is a lot of research on the subject of learning to classify images as well as work in cleaning up noise in images before classifying them.
Consider browsing through this nice collection of existing Python machine learning libraries.
http://scipy-lectures.github.com/advanced/scikit-learn/index.html
There are many techniques that relate to images as well.