I have 2 data sets of images, one is perfect square, so resizing to 224x224 for CNN will not result in any distortion, the other dataset is not square so resizing to 224x224 will result in image distortion.
I will split the sets to train and validation, is this a good way to train the model? will there be any bias in the model?
I am afraid the model will identify distortion rather than the real differences between the 2 sets..
In case you want to preserve your data, you can crop it randomly and use it to transform to square. That way your model will look on the cropped part of the image. Doing so can increase your data but this is one good data if you save the transformed image. However using random crop function from from the dataloader will stream line the process. Cropping is a good augmentation technique for preprocessing the data.
Related
how to use pretrained model on 512x512 images on real images with different sizes like Yolo object detection on real image ?
CNNs require fixed image sizes, so how do you manage to use the models on real images larger than the inputs?
If it is just about the image size, you could resize your image to have same size as model input. When you receive the output, assuming that you have bounding boxes or locations etc, you can always rescale them back to original image size. Many ML/DL frameworks provide this functionality
I am working on 3D image segmentation task, but the length of z-axis is different in every image. For the convolution neural networks, I think the length should be same in all images. How can I handle this?
I'm not sure what you're working with, but you should transform your input before passing it through your network (e.g. in a data loader). In this case, one of the transforms should be a resize operation (to the appropriate dimensions).
https://pytorch.org/vision/stable/transforms.html
I have a simple question to some of you. I have worked on some image classification tutorials. Only the simpler ones like MNIST dataset. Then I noticed that they do this
train_images = train_images / 255.0
Now I know that every value from the matrix (which is the image) gets divided by 255.0. If I remember correctly this is called normalization right? (please correct me if I am wrong otherwise tell me that I am right).
I'm just curious is there a "BETTER WAY","ANOTHER WAY" or "THE BEST WAY" to pre-process or clean images then those cleaned images will be fed to the network for training.
Please if you would like to provide a sample source code. Please! be my guest. I would love to look at code samples.
Thank you!
Pre-processing images prior to image classification can include the followings:
normalisation: which you already mentioned
reshaping into uniform resolution (img height x img width): higher resoltuion leads to better learning and smaller resolution may lose important features. Some models have default input size that you can refer to. Also an average size of all images can be used too.
color channel: 1 refers to gray-scale and 3 refers rgb-scale. Depending on your application you can set this.
data augmentation: if your model is overfitting or your dataset is small, you can reproduce your dataset by altering original images (flipping, rotating, cropping, zooming..) to increase your dataset
image segmentation: segmentation can be performed to highlight the area or boundaries that may benefit your application. For example, in medical image classification, some part of body maybe masked to enhance classification performance.
For example, I recently worked on image classification of lung CT scan images. For pre-processing, I have reshaped the images and made them gray-scale. Then I performed image segmentation to highlight the lungs in the images. And I normalised the image pixels to put into my classification model. Depending on your application, there may be other more pre-processing techniques you might want to consider.
I have a model trained from RGB image samples that take a 31x31 pixel region as input and produces a single classification for the center pixel.
I'd like to apply this model over an entire image to recover effectively a new image of classifications for each pixel. Since this isn't a convolution, I'm not sure what the preferred way to do this is in TensorFlow.
I know this is possible by exploding the image for inference into a ton of smaller tensors but this seems like a colossal waste since each pixel will be duplicated 961 times. Is there a way around this?
Make your model a fully-convolutional neural network, so for a 31x31 image it will produce a single label and for a 62x62 image it will produce 2x2 labels and so on. This will remove the redundant computation you talked about in case of windowing method.
If in case the network has a fully-connected layer, it can be replaced with a convolutional ones using a 1x1 kernel.
I am trying to implement object detection on satellite images. I have a annotated dataset, but the images are large and the model accepts only 416 x 416 size inputs. How can I pass small parts of the image in the network, ensuring that the annotations are retained. Also, how to merge these results at test time?
crop them with a small padding to prevent border effects and merge them back. you can see how this is done over here, line 552.
https://github.com/lopuhin/kaggle-dstl/blob/master/train.py