Is there any image segmentation model for segmenting images without putting labels for segmented parts (and bounding boxes)? I need to segment my image even for not-trained objects, so I guess I should use a model that does not have specific labels for segmenting.
You seem to be looking for unsupervised image segmentation?
https://github.com/kanezaki/pytorch-unsupervised-segmentation
https://github.com/Mirsadeghi/Awesome-Unsupervised-Segmentation
include some potential solutions.
Related
I am currently working on a project that requires me to segment parts of drone imagery into their regions of interest and to create masks based on them.
Below is a hand made example on what the best output should be like
Original image
Image with water body mask
Image with crops masks
Image with tillage masks
Image with road masks
Image with building masks
I am aware that the best way to do this is through semantic segmentation using convolutional neural networks with training data, but the task that was allocated to me was to do a more basic separation through colour segmentation, so ideally I can make a mask of all the green foliage and trees on a layer, the roads on another layer, the water bodies on their own layers, and buildings and others on a layer based on their colours and contours after cleaning the noise.
Enhanced contours
I increased the contrast of this image and used canny contour detection after clearing out some noise to apply enhanced contours to this image, so that it would be easier to pick out different instances of objects.
What I would like to try and do is to create a mask based on the contours of an object, then check the colour underneath to assign it a specific colour layer. Could anyone provide any ideas as to what sort of algorithm I could use to achieve this?
Also, I understand that the colour of the water and the roads could be very similar depending on the location, so this is not a very good solution. Any advice on how I could make a better distinction between the two?
Any insights into this would be much appreciated, thanks!
I am trying different image alignment approaches to align the images containing texts using Computer Vision. I have tested following image alignment approaches:
Probabilistic Houghlines Transform to align images according to the detected lines. https://medium.com/p/97b61eeffb20 is my implementation. But that didn't help me as expected.
Implemented SIFT and ORB to detect and align images according to the template image but instead of aligning all images, it distorts the image sometimes. I have used https://pyimagesearch.com/2020/08/31/image-alignment-and-registration-with-opencv/ as a reference.
Edge detection followed contour detection, corner detection and perspective transformation. But it doesn't work with images having different background types. This is the reference example https://pyimagesearch.com/2014/09/01/build-kick-ass-mobile-document-scanner-just-5-minutes/
morphology followed by contour detection and masking. Reference Crop exactly document paper from image
Trained the YOLO(You only look once) object detector to detect the documents but it detects the bounding box, my requirement is Quardilaterl with four image corners from which I can align documents using perspective transform.
Calculating the skewness and deskewing. Reference: https://github.com/sbrunner/deskew
But I couldn't align the document(identity documents such as citizenship, passport, license etc) images with different backgrounds perfectly using the above approaches.
This is a sample test image(important information are hidden due to privacy issue).
Is there are any other approaches of image alignment which can align the document images perfectly by correcting the skewness of the available text. My main focus is to extract the information form document using OCR preserving the information sequence in the document image.
Thank you!
To me, the third approach seems to be the most promising. But as you said, a cluttered background is a problem. Two ideas came to me about this:
Implementing a GUI as a fallback solution, so the user could select the contour.
Render some artificial dataset of official documents against a cluttered background and train a CNN to predict a segmentation map of the document. This map could be used then, as an initialization for the edge detection / contour detection. This answer contains two links to databases of images of official documents. Maybe these are of some use for you.
I have images that are 4928x3280 and I'd like to crop them into tiles of 640x640 with a certain percentage of overlap. The issue is that I have no idea how to deal with the bounding boxes of these files in my dataset as I've found this paper,(http://openaccess.thecvf.com/content_CVPRW_2019/papers/UAVision/Unel_The_Power_of_Tiling_for_Small_Object_Detection_CVPRW_2019_paper.pdf), but not code or so referring to how they did this. There are some examples on the internet that actually have the yoloV5 tiling but without overlap like this(https://github.com/slanj/yolo-tiling) one.
Does anyone know how I could make this myself or if someone has an example of this for me?
If you want a ready to go library to make possible tiling and inference for yolov5, there is SAHI:
<https://github.com/obss/sahi
You can use it to create tiles with related annotations, to make inferences and evaluate model performance.
I am new to deep learning, and I am trying to train a ResNet50 model to classify 3 different surgical tools. The problem is that every article I read tells me that I need to use 224 X 224 images to train ResNet, but the images I have are of size 512 X 288.
So my questions are:
Is it possible to use 512 X 288 images to train ResNet without cropping the images? I do not want to crop the image because the tools are positioned rather randomly inside the image, and I think cropping the image will cut off part of the tools as well.
For the training and test set images, do I need to draw a rectangle around the object I want to classify?
Is it okay if multiple different objects are in one image? The data set I am using often has multiple tools appearing in one image, and I wonder if I must only use images that only have one tool appearing at a time.
If I were to crop the images to fit one tool, will it be okay even if the sizes of the images vary?
Thank you.
Is it possible to use 512 X 288 images to train ResNet without cropping the images? I do not want to crop the image because the tools
are positioned rather randomly inside the image, and I think cropping
the image will cut off part of the tools as well.
Yes you can train ResNet without cropping your images. you can resize them, or if that's not possible for some reason, you can alter the network, e.g. add a global pooling at the very end and account for the different input sizes. (you might need to change kernel sizes, or downsampling rate).
If your bigest issue here is that resnet requires 224x224 while your images are of size 512x228, the simplest solution would be to first resize them into 224x224. only if that`s not a possibility for you for some technical reasons, then create a fully convolutional network by adding a global pooling at the end.(I guess ResNet does have a GP at the end, in case it does not, you can add it.)
For the training and test set images, do I need to draw a rectangle around the object I want to classify?
For classification no, you do not. having a bounding box for an object is only needed if you want to do detection (that's when you want your model to also draw a rectangle around the objects of interest.)
Is it okay if multiple different objects are in one image? The data set I am using often has multiple tools appearing in one image, and I
wonder if I must only use images that only have one tool appearing at
a time.
3.Its ok to have multiple different objects in one image, as long as they do not belong to different classes that you are training against. That is, if you are trying to classify apples vs oranges, its obvious that, an image can not contain both of them at the same time. but if for example it contains anything else, a screwdriver, key, person, cucumber, etc, its fine.
If I were to crop the images to fit one tool, will it be okay even if the sizes of the images vary?
It depends on your model. cropping and image size are two different things. you can crop an image of any size, and yet resize it to your desired dimensions. you usually want to have all images with the same size, as it makes your life easier, but its not a hard condition and based on your requirements you can have varying images, etc as well.
I have trained my classifiers by using Cascade classification from opencv for object classification.
I have three classes and I have got three *.xml files.
I know one region of the image must be one of the three classes.
However by using opencv, only detectMultiScale function is provided, I must scan the image (or ROI) to find all possible objects in it.
Is there method to classify whether one image(or roi) is matching a specified object or not?
Thank you!
From your question I understand that you want to classify three separate ROIs of an image. You might want to create three crops for the defined ROIs:
import cv2
img = cv2.imread("full_image.png")
crop_img1 = img[y:y+h, x:x+w]
#create crop_img2 and crop_img3 analogously
And apply a classifier on each of the three cropped images.