I'm trying to apply machine learning algorithms available in python's scikit-learn package to predict doodle names from set of doodle images.
Since I'm a complete beginner in machine learning and I have no knowledge about how neural network work yet. I wanted to try with scikit-learn's algorithms.
I've downloaded doodles ( of cats and guitars ) with the help of api named quickdraw.
Then I load the images with the following code
import numpy as np
from PIL import Image
import random
#To hold image arrays
images = []
#0-cat, 1-guitar
target = []
#5000 images of cats and guitar each
for i in range(5000):
#cat images are named like cat0.png, cat1.png ...
img = Image.open('data/cats/cat'+str(i)+'.png')
img = np.array(img)
img = img.flatten()
images.append(img)
target.append(0)
#guitar images are named like guitar0.png, guitar1.png ...
img = Image.open('data/guitars/guitar'+str(i)+'.png')
img = np.array(img)
img = img.flatten()
images.append(img)
target.append(1)
random.shuffle(images)
random.shuffle(target)
Then I applied the algorithm : -
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(images,target,test_size=0.2, random_state=0)
from sklearn.naive_bayes import GaussianNB
GB = GaussianNB()
GB.fit(X_train,y_train)
print(GB.score(X_test,y_test))
Upon running the above code (with other algorithms like SVM,MLP too), My system just freezes. I've do a force shutdown to get back. I'm not sure why is this happening.
I have tried lowering the number of images to load by changing
for i in range(5000):
to
for i in range(1000):
But I only get accuracy around 50%
First of all, if I may say so:
Since I'm a complete beginner in machine learning and I have no knowledge about >how neural network work yet. I wanted to try with scikit-learn's algorithms.
This is not a good way to approach ML in general, I strongly suggest you start studying the basics at least, otherwise you won't be able to tell what's going on at all (it's not something you can figure out by trying it).
Back to your problem, applying Naive Bayes methods to raw images it's not a good strategy: the problem is that each pixel of your image is a feature and with images you can get a very high number of dimensions easily (also assuming each pixel is independant of its neighbors it's not what you want).
NB is commonly used with documents and looking at this example on wikipedia might help you understand a bit more the algorithm.
In short, NB boils down to computing joint conditional probabilities, which boils down to counting co-occurences of features (words in wikipedia's example) being co-occurences of pixels in your case, which in turn boils down to computing a huge matrix of occurences that you need to formulate your NB model.
Now, if your matrix is made of all the words in a set of documents, this can get pretty expensive in both time and space (O(n^2)/2), with n being the number of features; instead, imagine the matrix being composed of ALL the pixels in your training set, as you're doing in your example... this explodes really fast.
That's why cutting your dataset to 1000 images allows your PC to not run out of memory.
Hope it helps.
Related
I'm trying to get the human pose information on low-resolution images. Particularly I've tried Keras OpenPose implementation by michalfaber, but the model seems to not perform well on low-resolution images while performing pretty well on higher resolution. I posted a question as an issue on GitHub repo as well but I thought I'd try here as well as I'm not set on that particular implementation of human pose detection.
My images are about 50-100 pixels width and height wise.
This is an example of the image. I wonder if anyone knows a way to modify the program, network, or knows of a human pose network that performs well on such low-resolution images.
If you are looking for a different human pose estimation network , I would highly recommend the MxNet GluonCV framework (https://gluon-cv.mxnet.io/model_zoo/pose.html). It is very simple to use and also contains many different pose estimation networks that you can try and compare tradeoff between accuracy and speed. For example, to use it you can do (Taken from the tutorial page):
from matplotlib import pyplot as plt
from gluoncv import model_zoo, data, utils
from gluoncv.data.transforms.pose import detector_to_alpha_pose, heatmap_to_coord_alpha_pose
detector = model_zoo.get_model('yolo3_mobilenet1.0_coco', pretrained=True)
pose_net = model_zoo.get_model('alpha_pose_resnet101_v1b_coco', pretrained=True)
# Note that we can reset the classes of the detector to only include
# human, so that the NMS process is faster.
detector.reset_class(["person"], reuse_weights=['person'])
im_fname = utils.download('https://github.com/dmlc/web-data/blob/master/' +
'gluoncv/pose/soccer.png?raw=true',
path='soccer.png')
x, img = data.transforms.presets.yolo.load_test(im_fname, short=512)
print('Shape of pre-processed image:', x.shape)
class_IDs, scores, bounding_boxs = detector(x)
pose_input, upscale_bbox = detector_to_alpha_pose(img, class_IDs, scores, bounding_boxs)
predicted_heatmap = pose_net(pose_input)
pred_coords, confidence = heatmap_to_coord_alpha_pose(predicted_heatmap, upscale_bbox)
For accuracy comparison for example, their AlphaPose with Resnet 101 network is significantly more accurate than OpenPose (You can find more accuracy benchmarks from the link above). A caveat, is however understanding the difference between types of these networks such as implementing Bottom-Up and Top-Down approach since it can affect the inference speed at different scenarios.
For example, the runtime of the top-down approaches is proportional to the number of detected people, it can be time-consuming if your image has a crowd of people.
I am trying to perform PCA on an image dataset with 100.000 images each of size 224x224x3.
I was hoping to project the images into a space of dimension 1000 (or somewhere around that).
I am doing this on my laptop (16gb ram, i7, no GPU) and already set svd_solver='randomized'.
However, fitting takes forever. Is the dataset and the image dimension just too large or is there some trick I could be using?
Thanks!
Edit:
This is the code:
pca = PCA(n_components=1000, svd_solver='randomized')
pca.fit(X)
Z = pca.transform(X)
X is a 100000 x 150528 matrix whose rows represent a flattened image.
You should really reconsider your choice of dimensionality reduction if you think you need 1000 principal components. If you have that many, then you no longer have interpretability so you might as well use other and more flexible dimensionality reduction algorithms (e.g. variational autencoders, t-sne, kernel-PCA). A key benefit of PCA is the interpretability if the principal components.
If you have a video stream of the same place, then you should be fine with <10 components (though principal component pursuit might be better). Moreover, if your image-dataset is not comprised of similar-ish images, then PCA is probably not the right choice.
Also, for images, nonnegative matrix factorisation (NMF) might be better suited. For NMF, you can perform stochastic gradient optimisation, subsampling both pixels and images for each gradient step.
However, if you still insist on performing PCA, then I think that the randomised solver provided by Facebook is the best shot you have. Run pip install fbpca and run the following code
from fbpca import pca
# load data into X
U, s, Vh = pca(X, 1000)
It's not possible to get faster than that without utilising some matrix structure, e.g. sparsity or block composition (which your dataset is unlikely to have).
Also, if you need help to pick the correct number of principal components, I reccomend using this code
import fbpca
from bisect import bisect_left
def compute_explained_variance(singular_values):
return np.cumsum(singular_values**2)/np.sum(singular_values**2)
def ideal_number_components(X, wanted_explained_variance):
singular_values = fbpca.svd(X, compute_uv=False) # This line is a bottleneck.
explained_variance = compute_explained_variance(singular_values)
return bisect_left(explained_variance, wanted_explained_variance)
def auto_pca(X, wanted_explained_variance):
num_components = ideal_number_components(X, explained_variance)
return fbpca.pca(X, num_components) # This line is a bottleneck if the number of components is high
Of course, the above code doesn't support cross validation, which you really should use to choose the correct number of components.
You can try to set
svd_solver="svd_solver"
The training should be much faster.
You could also try to use :
from sklearn.decomposition import FastICA
Which is more scalable
Last resort solution could be to turn your images black & white, to reduce the dimension by 3, this might be a good step if your task is not color-sentitive (for instance Optical character Recognition)
try to experiment with iterated_power parameter of PCA
I'm working with Tensorflow but I'm pretty new to Python and machine learning. If I have a tensor of an image from my input pipeline what would be the best way to train it? Like in the basics, how would I handle passing trough data? I have structure I would like to use (I know I can get certain data from certain things like tensors) but I'm just not sure how to do so.
I'm very new to this so all help would be greatly appreciated.
def model(image_tensor):
tf.summary.image(img)
return predictions
def loss(predictions, labels):
return some_loss
def train(some_loss):
return train_op
Tensorflow may be a bit complicated for someone new to machine learning and python. My advice is to go through the excellent notebook tutorials that exist on tensorflow sites and start to understand the abstraction.
However, before that, I would use python with numpy (and sometimes scipy) to implement basic machine methods like Stochastic Gradient Descent just to ensure that you understand how the algorithms work. Then implement a simple logistic regression.
So why do I ask you to do all that? Well, because once you get a good handle of how to work with machine learning algorithm and how tedious it can be find the gradients, you will understand why tensorflow abstraction is useful.
I'm going to provide you with some simple examples dealing with MNIST.
from sklearn.datasets import load_digits
import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
mnist = load_digits(2)
print("y [shape: {}] {}] : {}".format(y.shape,y[:10]))
print("x [shape: {}] {}]".format(x.shape)
What i've essentially done above is load two digits from the MNIST dataset (0 and 1) and display the array for the vector y and matrix (x).
If you want to see how the images look you can plt.imshow(X[0].reshape([8,8]))
The next step is to start defining our placeholder and variables
input_x = tf.placeholder(tf.float32,shape=[None,X.shape[1]], name = "input_x")
input_y = tf.placeholder(tf.float32,shape=[None,],name = "labels")
weights = tf.Variable(initial_value = tf.zeros(shape=[X.shape[1],1]), name="weights")
b = tf.Variable(initial_value=0.0, name = "bias")
We have done here is defined two placeholder in tensorflow and have told what the variables should expect as an input. I also gave the placeholder a name for debugging purpose.
prediction_y = tf.squeeze(tf.nn.sigmoid(tf.add(tf.matmul(input_x,weights),tf.cast(b,tf.float32))))
loss = tf.losses.log_loss(input_y,prediction_y)
optimizer = tf.train.Adamoptimizer(0.001).minimize(loss)
There you go, that's a logistic regression in tensorflow. What the last block does is apply the activation function to our input vectors, defines the loss function and then defines an optimizer for the loss function.
The final step is to run it.
from sklearn.metrics import roc_auc_score
s.run(tf.global_variables_initializer())
for i in range(10):
s.run(optimizer,{input_X:X_train, input_y: y_train})
loss_i = s.run(loss, {input_x:x_train,input_y:y_train})
print("loss at iteration {}: {}".format(i, loss_i))
That's essentially how you run your data through tensorflow. This code may have typos, I don't have python on this machine so i'm writing based on memory. However the basic idea is there. Hope this helps.
Edit: Also since you asked best way to train image data. My answer to you would be there isn't a "best". Building a CNN is a typical approach that you may want to experiment using assuming you have large number of classified images. Prior to that people also used support vectors relatively well for classifying images.
I am very new to machine learning and have been implementing ML algorithms on the datasets.
But how do I go about classifying images using the Ml algorithms?
How do I feed the images to the learning models in the form of numpy arrays?
Can anyone brief me about the steps involved? I have been reading about feature extraction but I am not able to figure out how to do that.
Image classification is not much different, at its core, from any other sort of classification.
Your data are images, right? Well, we need to create some variables ("features") from those images in order to get a sense of what's in the images. Computers can understand matrices, not just straight-up images like humans do (although there are arguments that what humans are doing when they see images is deconstructing images into patterns of pixels, but let's keep it simple). Using OpenCV is a great way to turn image pixels into matrices.
Each matrix (i.e. each image) will have a corresponding tag or classification (e.g. "dog" or "cat"). You feed those matrices through your algorithm in order to classify each image.
That will get you started. There's so much that goes into machine learning related to images, but at its core, the problem is the same as elsewhere: take a matrix/set of data and use an algorithm to find patterns in the data and a function that maps the input to the output label. You might be served well by reading an intro to machine learning book or taking a course.
I am trying to perform image segmentation using machine learning (SVM in particular). I am segmenting MRIs and the original images are 512x512x100. I have created 78 features per image. At that image size and number of features I quickly run out of memory.
To resolve the memory issue I have done a couple of things. 1) I down sampled the images to 256x256x50. I also reduced the precision to 16bit float as the original image is 16bit and so I didn't believe it necessary to have more precise data than that. (Maybe I'm wrong here.)
So. I was able to reduce the memory of my data to an amount that can be held in memory 6GB. Until I went to actually use the SVM function in sklearn and my computer quickly started using swap memory as it had run out of ram (16gb). I went searching a bit and found on the sklearn docs (http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html) that "If X and y are not C-ordered and contiguous arrays of np.float64 and X is not a scipy.sparse.csr_matrix, X and/or y may be copied." This along with other posts on github made me realize the data was being scaled up to flaot64 and therefore taking up all my memory, as going from 16 to 64 from what I gather would increase the ram from 6 to 24gb... which goes beyond what I have available.
Here is a simple example of the code. Features is a bumpy array of 39,321,600 (256*256*50*12[training images]) by 78 (the features) and segmentations is 39,321,600 by 1 with values between 0-6 for the various regions of interest.
from sklearn import svm
clf = svm.SVC()
clf.fit(features, segmentations)
Above is the only code that is relevant at this point as I haven't gotten past the training portion.
Any help with either training a dataset of this size using SVM and sklearn, or any other options would be greatly appreciated.
Thanks.
Anthony.
PS. I have performed a subsampling of the data as an option. Though, this is not ideal as I would like to use the whole image. If this is my best bet I guess I will pursue.