I am implementing an autoencoder, used to rebuild color images. The loss function I want to use requires a reduced color set (max ~100 different colors) but I am struggling to find a suitable differentiable algorithm.
Another doubt I have is the following: is it better to apply such quantization directly in the loss function, or can I implement it in a custom non-trainable layer? In the second case, need the algorithm to be differentiable?
My first idea approaching this problem was to quantize the images before feeding them to the network, but I don`t know how to "force" the network to produce only the quantized colors as output.
Any suggestion is greatly appreciated, I do not need code, just some ideas or new perspectives. Being pretty new to Tensorflow I am probably missing something.
If you want to compress the image, it seems you want to find discrete color set for image compression. In that case auto-encoder is not suitable approach for image compression.
The general auto-encoder compress tensor of images(B x C x H x W) to latent code of each images(B x D, typically D = 512). The beauty of this approach is that the optimal latent space is found 'automatically'.
Nevertheless if you want to utilize convex optimization tool of tensorflow, some continuous relaxation technique like interpolation could be helpul.
In the following paper, they utilize continuous relaxation for discrete path selection of neural network.
Liu, H., Simonyan, K., & Yang, Y. (2018). Darts: Differentiable architecture search. ICLR.
In the following paper, they utilize interpolation to learn quantized kernel bank on look-up table.
Jo, Y., & Kim, S. J. (2021). Practical single-image super-resolution using look-up table. CVPR.
Both of them provide codes.
Related
How to use Multilayered Perceptron for clustering like K-Means on non-labeled dataset.
I've MNIST dataset with labels but i was wanted to perform clustering algorithm with MLP.
Any idea?
Edit: if the problem is restricted to use an MLP exclusively, I think you're looking for differentiable objectives for clustering. (K-Means objective is not differentiable because of the finding the centroids part). I think this is not a 'mainstream' approach to clustering, but certainly there seems to be some work to use deep networks to optimize clustering (differentiable) objectives:
Differentiable Deep Clustering with Cluster Size Constraints
: "we exploit the connection between optimal transport and k-means, and rely on entropic regularization to
derive a fully-differentiable clustering loss that can be
used in (P) and directly optimized with SGD". So you can apply SGD to an MLP, is an MLP the best architecture for using this loss? Depends on your data.
Another approach I could think of using ANNs is self-organizing maps (or Kohonen maps). It depends how relaxed is your definition of MLP, you can certainly add a bunch of layers between the input layer and the output feature maps.
You can potentially use a MLP to embed your data in to a vector space, which you can use to compute some metric during KMeans (eg Euclidean distance) which might or might not make sense, depending on how you compute the embeddings and the dataset.
You could do this with an Autoencoder in the absence of labels, though that is a bit more complex than a simple MLP:
This could be an overkill though, it really depends on the problem. Consider doing KMeans on your data first (no MLP). If the problem is complicated enough, moving the data to latent space could work, this is essentially what word2vec does and people do clustering and all sort of things with it (see this)
I am attempting to do a two-fold task. The input is an image and based on the input I want to pick another image from a set of images (classification task) and then use both the images to obtain an output tensor. Clearly, I can train both the models separately if I know the ground truth of which image I should pick from that set. But, I only have the output tensor ground truth.
The problem, as it appears to me, is that if we employ a classification layer, the gradients will not be differentiable anymore. How do I deal with this problem? Is there literature which uses this kind of architecture for any application? TIA
More details: I have multiple images of an object/scene and I want to use two of those images for some kind of reconstruction problem. To maximize the performance of reconstruction, I want to smartly choose the second image if I am given the first image. For eg., I have three images A, B, C and using AC gives the best result. I need a model which given A predicts C and then using AC I can achieve the reconstruction. Is the task clear now? I do not have ground truth which says AC is better than AB. Is the task clear now?
So basically, you want to do a classification task followed by a reconstruction task.
Here is what I suggest (I do not pretend this the absolute best solution, but it's how I would approach this problem) :
You can create a single task that does Classification--> Reconstruction with a single loss. Let's still separate this network in two and call net_class the part that does classification , and net_reconstruct the part performing reconstruction.
Let's say your classification network predicts {'B': 0.1, 'C': 0.9). Instead of using only image 'C' for reconstruction, I would feed both pairs (A-B and A-C) to the second network and compute a reconstruction loss L (I'm not an expert in reconstruction, but I guess there are some classical losses in this).
Therefore, you would compute two losses L(A-B) and L(A-C).
My total loss would be 0.1 * L(A-B) + 0.9 L(A-C). This way, you would train net_class to choose the pairing that minimizes the reconstruction loss and you would still train net_reconstruct to minimize both losses, and the loss is continuous (and therefore, differentiable according to AI experts ;) ).
The idea behind this loss is three-fold :
1 - Improving the reconstructor makes the loss go down (since both L(A-B) and L(A-C) would decrease. therefore, this loss should make your reconstructor converge towards something you want.
2 - Let's imagine your reconstructor is pretty much trained (L(A-B) and L(A-C) are relatively low). Then, your classifier has an incentive to predict the class which has the lowest reconstruction loss.
3 - Now, your reconstructor and your classifier will train at the same time. You can expect, at the end of the training, to have a classifier that would output pretty much binary results (like 0.998 vs 0.002). AAt that point, your reconstructor will almost only train on the scene associated with the 0.998 ouput. This should not be a problem, since, if I understood correctly your problem, you want to perform the reconstruction part only for the top classified scene.
Note that this method also works if you're not performing deep learning for the reconstruction part.
If you want some inspiration on this kind of topic, I recommend you read some blog posts about GANs (Generative Adversarial Networks). They use the same two stage - one loss trick (with some slight differences of course, but the ideas are very close).
Good luck !
I already asked this question here: Can Convolutional Neural Networks (CNN) be represented by a Mathematical formula? but I feel that I was not clear enough and also the proposed idea did not work for me.
Let's say that using my computer, I train a certain machine learning algorithm (i.e. naive bayes, decision tree, linear regression, and others). So I already have a trained model which I can give a input value and it returns the result of the prediction (i.e. 1 or 0).
Now, let's say that I still want to give an input and get a predicted output. However, at this time I would like that my input value to be, for example, multiplied by some sort of mathematical formula, weights, or matrix that represents my "trained model".
In other words, I would like that my trained model "transformed" in some sort of formula which I can give an input and get the predicted number.
The reason why I want to do this is because I wanna train a big dataset and use complex prediction model. And use this trained prediciton model in simpler hardwares such as a PIC32 microcontroler. The PIC32 Microntroler would not train the machine learning or store all inputs. Instead, the microcontroler would simple read from the system certain numbers, apply a math formula or some sort of matrix multiplication and give me the predicted output. With that, I can use "fancy" neural networks in much simpler devices that can easily operate math formulas.
If I read this properly, you want a generally continuous function in many variables to replace a CNN. The central point of a CNN existing in a world with ANNs ("normal" neural networks) is that in includes irruptive transformations: non-linearities, discontinuities, etc. that enable the CNN to develop recognitions and relationships that simple linear combinations -- such as matrix multiplication -- cannot handle.
If you want to understand this better, I recommend that you choose an introduction to Deep Learning and CNNs in whatever presentation mode fits your learning styles.
Essentially, every machine learning algorithm is a parameterized formula, with your trained model being the learned parameters that are applied to the input.
So what you're actually asking is to simplify arbitrary computations to, more or less, a matrix multiplication. I'm afraid that's mathematically impossible. If you ever do come up with a solution to this, make sure to share it - you'd instantly become famous, most likely rich, and put a hell of a lot of researchers out of business. If you can't train a matrix multiplication to get the accuracy you want from the start, what makes you think you can boil down arbitrary "complex prediction models" to such simple computations?
I was recently learning about neural networks and came across MNIST data set. i understood that a sigmoid cost function is used to reduce the loss. Also, weights and biases gets adjusted and an optimum weights and biases are found after the training. the thing i did not understand is, on what basis the images are classified. For example, to classify whether a patient has cancer or not, data like age, location, etc., becomes features. in MNIST dataset, i did not find any of that. Am i missing something here. Please help me with this
First of all the Network pipeline consists of 3 main parts:
Input Manipulation:
Parameters that effect the finding of minimum:
Parameters like your descission function in your interpretation
layer (often fully connected layer)
In contrast to your regular machine learning pipeline where you have to extract features manually a CNN uses filters. (Filters like in edge detection or viola and jones).
If a filter runs across the images and is convolved with pixels it Produces an output.
This output is then interpreted by a neuron. If the output is above a threshold it is considered as valid (Step function counts 1 if valid or in case of Sigmoid it has a value on the sigmoid function).
The next steps are the same as before.
This is progressed until the interpretation layer (often softmax). This layer interprets your computation (if the filters are good adapted to your problem you will get a good predicted label) which means you have a low difference between (y_guess - y_true_label).
Now you can see that for the guess of y we have multiplied the input x with many weights w and also used functions on it. This can be seen like a chain rule in analysis.
To get better results the effect of a single weight on the input must be known. Therefore, you use Backpropagation which is a derivative of the Error with respect to all w. The Trick is that you can reuse derivatives which is more or less Backpropagation and it becomes easier since you can use Matrix vector notation.
If you have your gradient, you can use the normal concept of minimization where you walk along the steepest descent. (There are also many other gradient methods like adagrad or adam etc).
The steps will repeat until convergence or until you reach the maximum epochs.
So the answer is: THE COMPUTED WEIGHTS (FILTERS) ARE THE KEY TO DETECT NUMBERS AND DIGITS :)
I am trying to see the feasibility of using TensorFlow to identify features in my image data. I have 50x50px grayscale images of nuclei that I would like to have segmented- the desired output would be either a 0 or 1 for each pixel. 0 for the background, 1 as the nucleus.
Example input: raw input data
Example label (what the "label"/real answer would be): output data (label)
Is it even possible to use TensorFlow to perform this type of machine learning on my dataset? I could potentially have thousands of images for the training set.
A lot of the examples have a label correspond to a single category, for example, a 10 number array [0,0,0,0,0,0,0,0,0,0,0] for the handwritten digit data set, but I haven't seen many examples that would output a larger array. I would assume I the label would be a 50x50 array?
Also, any ideas on the processing CPU time for this time of analysis?
Yes, this is possible with TensorFlow. In fact, there are many ways to approach it. Here's a very simple one:
Consider this to be a binary classification task. Each pixel needs to be classified as foreground or background. Choose a set of features by which each pixel will be classified. These features could be local features (such as a patch around the pixel in question) or global features (such as the pixel's location in the image). Or a combination of the two.
Then train a model of your choosing (such as a NN) on this dataset. Of course your results will be highly dependant upon your choice of features.
You could also take a graph-cut approach if you can represent that computation as a computational graph using the primitives that TensorFlow provides. You could then either not make use of TensorFlow's optimization functions such as backprop or if there are some differentiable variables in your computation you could use TF's optimization functions to optimize those variables.
SoftmaxWithLoss() works for your image segmentation problem, if you reshape the predicted label and true label map from [batch, height, width, channel] to [N, channel].
In your case, your final predicted map will be channel = 2, and after reshaping, N = batchheightwidth, then you can use SoftmaxWithLoss() or similar loss function in tensorflow to run the optimization.
See this question that may help.
Try using a convolutional filters for the model. A stacking of convolution and downsampling layers. The input should be the normalized pixel image and output should be the mask. The last layer should be a softmaxWithLoss. HTH.