I am using python 3.5 with tensorflow 0.11.
I have a dataset with large number of features (>5000) and relatively small number of samples(<200). I am using wrapper skflow function DNNClassifier for deep learning.
It seems to work work well for classification task, but I want to find some important features from large number of features.
Internally, DNNClassifier seems to perform feature selection(or feature
extraction). Is there any way to perform feature selection with tensorflow?
Or, is there some function to extract the weights of the features?
(There was a function DNNClassifier.weights_, but it seems to be deprecated)
If Tensorflow does not support feature selection or weight information, will it be reasonable to conduct feature selection using other method(such as univariate feature selection) and then try deep learning?
Thank you for help.
You can eval the weights.
For example if your variable is define by
weights = tf.Variable(np.ones([100,10],dtype='float32'), name=weights)
you can get it value at the tensorflow session
value = weights.eval();
Related
I´m trying to cluster a dataframe with 36 features and a lot (88%) of zeros. It's my first job at ML, I started with K-Means, but for any K I choose, 99.5% of my data remains on cluster 0. I've tried some PCA do reduce features, but the same problem appeared.
Any thoughts on approaches I can try?
Have you tried techniques such as sequential feature selection? These are so-called 'wrapper methods' where you add (for forward selection) or eliminate (for backward elimination) one feature at a time and assess the performance of the model accordingly. I tend to use supervised learning models in my job but I believe you can use sequential selection algorithms to assess unsupervised models as well. I have used the sklearn library for this: https://scikit-learn.org/stable/modules/feature_selection.html
This tutorial describes how to build a TFF computation from keras model.
This tutorial describes how to build a custom TFF computation from scratch, possibly with a custom federated learning algorithm.
What I need is a combination of these: I want to build a custom federated learning algorithm, and I want to use an existing keras model. Q. How can it be done?
The second tutorial requires MODEL_TYPE which is based on MODEL_SPEC, but I don't know how to get it. I can see some variables in model.trainable_variables (where model = tff.learning.from_keras_model(keras_model, ...), but I doubt it's what I need.
Of course, I can implement the model by hand (as in the second tutorial), but I want to avoid it.
I think you have the correct pointers for writing a custom federated computation, as well as converting a Keras model to a tff.learning.Model. So we'll focus on pulling a TFF type signature from an existing tff.learning.Model.
Once you have your hands on such a model, you should be able to use tff.learning.framework.weights_type_from_model to pull out the appropriate TFF type to use for your custom algorithm.
There is an interesting caveat here: how precisely you use a tff.learning.Model in your custom algorithm is pretty much up to you, and this could affect your desired model weights type. This is unlikely to be the case (likely you will simply be assigning values from incoming tensors to the model variables), so I think we should prefer to avoid going deeper into this caveat.
Finally, a few pointers of end-to-end custom algorithm implementations in TFF:
One of the simplest complete examples TFF has is simple_fedavg, which is totally self-contained and contains instructions for running.
The code for a paper on Adaptive Federated Optimization contains a handwritten implementation of learning rate decay on the clients in TFF.
A similar implementation of adaptive learning rate decay (think Keras' functions to decay learning rate on plateaus) is right next door to the code for AFO.
I have been using pytorch a lot and got used to their dataloaders and transforms, in particular when it comes to data augmentation, as they're very user-friendly and easy to understand.
However, I need to run some ML models from sklearn.
Is there a way to use pytorch's dataloaders for sklearn ?
Yes, you can. You can do this with sklearn's partial_fit method. Read HERE.
6.1.3. Incremental learning
Finally, for 3. we have a number of options inside scikit-learn. Although all algorithms cannot learn
incrementally (i.e. without seeing all the instances at once), all
estimators implementing the partial_fit API are candidates. Actually,
the ability to learn incrementally from a mini-batch of instances
(sometimes called “online learning”) is key to out-of-core learning as
it guarantees that at any given time there will be only a small amount
of instances in the main memory. Choosing a good size for the
mini-batch that balances relevancy and memory footprint could involve
some tuning [1].
Not all algorithms can do this, however.
Then, you can use pytorch's dataloader to preprocess the data and feed it in batches to partial_fit.
I came across the skorch library recently and this could help you.
"The goal of skorch is to make it possible to use PyTorch with sklearn. "
From the skorch docs:
class skorch.dataset.Dataset(X, y=None, length=None)
General dataset wrapper that can be used in conjunction with PyTorch DataLoader.
I guess you could use the Dataset class for wrapping your PyTorch DataLoader and use sklearn models. If you would like to use other PyTorch features like PyTorch Tensors you could also do that.
I am looking to develop a simple Neural Network in PyTorch or TensorFlow to predict one numeric value based on several inputs.
For example, if one has data describing the interior comfort parameters for a building, the NN should predict the numeric value for the energy consumption.
Both PyTorch or TensorFlow documented examples and tutorials are generally focused on classification and time dependent series (which is not the case). Any idea on which NN available in those libraries is best for this kind of problems? I'm just looking for a hint about the type, not code.
Thanks!
The type of problem you are talking about is called a regression problem. In such types of problems, you would have a single output neuron with a linear activation (or no activation). You would use MSE or MAE to train your network.
If your problem is time series(where you are using previous values to predict current/next value) then you could try doing multi-variate time series forecasting using LSTMs.
If your problem is not time series, then you could just use a vanilla feed forward neural network. This article explains the concepts of data correlation really well and you might find it useful in deciding what type of neural networks to use based on the type of data and output you have.
I need advice choosing a model and machine learning algorithm for a classification problem.
I'm trying to predict a binary outcome for a subject. I have 500,000 records in my data set and 20 continuous and categorical features. Each subject has 10--20 records. The data is labeled with its outcome.
So far I'm thinking logistic regression model and kernel approximation, based on the cheat-sheet here.
I am unsure where to start when implementing this in either R or Python.
Thanks!
Choosing an algorithm and optimizing the parameter is a difficult task in any data mining project. Because it must customized for your data and problem. Try different algorithm like SVM,Random Forest, Logistic Regression, KNN and... and test Cross Validation for each of them and then compare them.
You can use GridSearch in sickit learn to try different parameters and optimize the parameters for each algorithm. also try this project
witch test a range of parameters with genetic algorithm
Features
If your categorical features don't have too many possible different values, you might want to have a look at sklearn.preprocessing.OneHotEncoder.
Model choice
The choice of "the best" model depends mainly on the amount of available training data and the simplicity of the decision boundary you expect to get.
You can try dimensionality reduction to 2 or 3 dimensions. Then you can visualize your data and see if there is a nice decision boundary.
With 500,000 training examples you can think about using a neural network. I can recommend Keras for beginners and TensorFlow for people who know how neural networks work.
You should also know that there are Ensemble methods.
A nice cheat sheet what to use is on in the sklearn tutorial you already found:
(source: scikit-learn.org)
Just try it, compare different results. Without more information it is not possible to give you better advice.