I want to implement a data augmentation technique to apply on CNN algorithm, by cutting multiple images (4 images) of the same sizes say 11x11, and mixing them afterwards to have another image of 11x11 that combine random parts of those multiple images into one.
my question is there any library that can help me to implement this RICAP algorithm?
here is a link explaining the concept :
https://blog.roboflow.com/why-and-how-to-implement-random-crop-data-augmentation/
I'm using TensorFlow for training my deep learning, and the images are created from pandas arrays.
thank you in advance
Related
I'm new to data augmentation and so far I've gathered that it is used to make the dataset bigger by changing the data in it slightly(i.e. rotating or cropping) and adding the augmented images to the dataset. Does it work that way?
And if so, I have seen a lot of examples of using Albumentations library and in those examples we just change images in the dataset with a certain probability, but the size of the dataset remains the same. I feel like I'm missing something.
Thank you in advance!
I have a folder with hundres/thousands of images, some of them look alike. I would like to create clusters separating those images (those which look alike in the same cluster).
I can't determine the number of clusters that will be needed, it depends on the images.
Does anyone have an idea on how to do this using Python, OpenCV and which algorithm to use?
I've made some research and found that AffinityPropagation or DBSCAN can be useful for me but I don't know where to start (how to encode my images, what should I pass to those algorithms etc...)
Unfortunately it is not that simple with images, since naively clustering would result in clusters of images with the same colors, not the same "content". You can use a neural network as a feature extractor for the images, I see two options:
Use a pre-trained network and get the features from an intermediate layer
Train an autoencoder on your dataset, and use the latent features
Option 1 is cheaper since you can easily find pre-trained models, option 2 is much more computationally expensive but should work better, especially if there is no pre-trained model on your domain.
This tutorial (randomly found on the internet) seems to be a good introduction to method 2.
So here is my question:
I want to make my very own dataset using a motion capture camera system to get the ground truth poses and one RGB camera to get images, and then using this as input to my network, train/test a convNet.
I have looked around at other datasets for tensorflow, caffe and Matlab. I have viewed the MNIST, Cats/Dogs, Iris, LSP, HumanEva, HumanEva3.6, FLIC, etc. datasets and have viewed and tried to understand their data as best as I can. I have viewed online people trying to make their own datasets. The one thing is usually when you use their datasets as an example, you download a .txt file that already contains the labels.
If anyone could please explain to me how to use the image data with the labels to feed it into my network, it would be a tremendous help. I have made code before using tensorflow to input a .txt file into the network and get the correct predicted output. But, my brain is missing something to understand how to input an image with a label. How to I create that dataset?
Your input images and your labels are two separate variables. You will be writing separate bits of code to import them. The videos typically need to be converted to JPG files (it's a royal pain to read video files directly, mostly because you can't randomly skip around the video easily).
Probably the easiest way to structure you data is via a CSV that contains filename, poseinfoA, poseinfoB, etc. And the filename refers to the JPG image on disk.
To get started on the basics, I suggest looking at the Aymericdamen tutorial examples, I haven't found tutorials anywhere that were as clear and concise.
https://github.com/aymericdamien/TensorFlow-Examples
Those examples don't go into detail on the data input pipeline though. To set up a good data input pipeline in tensorflow I suggest you use the new (as of TF 1.4) Dataset object. It will force you into a good data input pipline workflow, and it's the way all data input is going in tensorflow, so it's worth learning. It's also easy to test and debug when you write it this way. Here's the guide you want to follow.
https://www.tensorflow.org/programmers_guide/datasets
You can start your Dataset object from the CSV, and use a dataset.map_fn() to load the images using tf.image.decode_jpeg
Since you're doing pose estimation I'll also suggest a nice blog I came across recently that will probably interest you. The topic is segmentation, but pose estimation is quite related.
http://blog.qure.ai/notes/semantic-segmentation-deep-learning-review
I am very new to machine learning and have been implementing ML algorithms on the datasets.
But how do I go about classifying images using the Ml algorithms?
How do I feed the images to the learning models in the form of numpy arrays?
Can anyone brief me about the steps involved? I have been reading about feature extraction but I am not able to figure out how to do that.
Image classification is not much different, at its core, from any other sort of classification.
Your data are images, right? Well, we need to create some variables ("features") from those images in order to get a sense of what's in the images. Computers can understand matrices, not just straight-up images like humans do (although there are arguments that what humans are doing when they see images is deconstructing images into patterns of pixels, but let's keep it simple). Using OpenCV is a great way to turn image pixels into matrices.
Each matrix (i.e. each image) will have a corresponding tag or classification (e.g. "dog" or "cat"). You feed those matrices through your algorithm in order to classify each image.
That will get you started. There's so much that goes into machine learning related to images, but at its core, the problem is the same as elsewhere: take a matrix/set of data and use an algorithm to find patterns in the data and a function that maps the input to the output label. You might be served well by reading an intro to machine learning book or taking a course.
I have been working on MNIST dataset to learn how to use Tensorflow and Python for my deep learning course.
I could read the data internally/externally and also train it in softmax and cnn thanks to tensorflow tutorial at website. At the end, I could get >%90 in softmax, >%98 in cnn, accuracy.
My problem is that I want to resize all images on MNIST as 14x14 and train it again, also to augment all (noising, rotating etc.) and train again. At the end, I want to be able to compare the accuracies of these three different dataset.
Could you please help me to solve it? How to resize all images and how the model should change.
Thanks!
One way to resize images is using the scipy resize function:
from scipy.misc import imresize
img = imresize(yourimage, (14, 14))
But my real advice to you is that should take a look at the Kadenze course "Creative applications of deep learning". This is a notebook for lecture two: https://github.com/pkmital/CADL/blob/master/session-2/lecture-2.ipynb
This course is really good at helping you understand using images and Tensorflow.
What you need is some image processing library like OpenCV, PIL etc. If you are using the dataset downloaded from tensorflow, it will be a 3d array( array of 2d arrays(every image)) or have more dimensions depending on how it's stored (I'm not sure) you can treat numpy arrays as images and use them with any image processing library you like but make sure what datatype they are in and if it's compatible with the libraries you are using.
Also, tensorflow also has such functions if you want to keep it all in tensorflow.
this post has an accepted answer.