Hyper-parameter Optimisation in Cpp? - python

I need to fit a deep neural network to data coming from a data generating process, think of an AR(5). So I have five features per observation and one y for some large number N observations in each simulation. I am interested only in the root mean squared error of the best performing DNN in each simulation.
Since it's a simulation setting, I have to do a large number of these simulations and within each simulation fit a neural network to the data. The only reasonable way I can think of doing this is fit the DNN via hyper-parameter optimisation given each simulation (dlib's find_min_global will be my optimiser).
Does it make sense to do this exercise in C++ (slow development because I am not proficient) or Python (faster iteration because I am fairly proficient).
From where I am sitting, C++ or Python might not make much of a difference in execution time, because the model has to be compiled each time the optimiser proposes a new hyper-parameter vector (am I wrong here?).
If it is possible to compile once, and test all hyper-parameters between the lower and upper bounds, then C++ would be my go to solution(Is this possible in any of the open source DNN languages?).
If anyone has done this exercise before, please advice.
Thank you all for your help.

See looking at your problem, one way to implement this is to use genetic/evolutionary algorithm. Considering that I understood your problem correctly, you want to sweep through all the hyper-parameters to get the get the best solution.
So, I would recommend using python for this and tensorflow, keras all support this. So this might not be a problem.
Note - If I understood your question differently, then please feel free to correct me.

Related

Regression problem optimization using ML or DL

I have some data (data from sensors and etc.) from an energy system. consider the x-axis is temperature and the y-axis is energy consumption. Suppose we just have data and we don't have access to the mathematical formulation of the problem:
energy consumption vs temperature curve
In the above figure, it is absolutely obvious that the optimum point is 20. I want to predict the optimum point using ML or DL models. Based on the courses that I have taken I know that it's a regression supervised learning problem, however, I don't know how can I do optimization on this kind of problem.
I don't want you to write a code for this problem. I just want you to give me some hints and instructions about doing this optimization problem.
Also if you recommend any references or courses, I will welcome them to learn how to predict the optimum point of a regression supervised learning problem without knowing the mathematical formulation of the problem.
There are lots of ways that you can try when it comes to optimizing your model, for example, fine tuning your model. What you can do with fine tuning is to try different options that a model consists of and find the smallest errors or higher accuracy based on the actual and predicted data.
Using DecisionTreeRegressor model, you can try to use different split criterion, limit the minimum number of split & depth to see which give you the best predicted scores/errors. For neural network model, using keras, you can try different optimizers, try different loss functions, tune your parameters etc. and try all out as a combination of model.
As for resources, you can go Google, Youtube, and other platform to use keywords such as "fine tuning DNN model" and a lot of resources will pop up for your reference. The bottom line is that you will need to try out different models and fine tune your model until when you are satisfied with your results. The results will be based on your judgement and there is no right or wrong answers (i.e., errors are always there), it just completely up to you on how would you like to achieve your solution with handful of ML and DL models that you got. My advice to you is to spend more time on getting your hands dirty. It will be worth it in the long run. HFGL!

How does optimization happen in the parameters of the algorithms in sci-kit learn library?

When Machine Learning is seen mathematically, we have cost functions, to reduce the error in the prediction for the next time and we keep on optimizing the parameters of the equation/s used in the particular algorithm.
I wonder where does this optimization happen in the library
Sci-kit learn.
There is no function for doing this job, so far I know,there are rather a bunch of algorithms as functions.
Can someone please tell me how do I optimize those parameters in sci-kit learn, and is there a way to do it in the mentioned library or is it just for learning purposes.
I saw the code of library of logistic regression but got nothing.
Any effort is appreciated.
I got it.
GridsearchCV is the answer, thats what I was looking for.
I think it allows us to choose the values of alpha, c and number of iterations, therefore, not allowing to alter the values of weights directly and I think thats ok or thats how we'd assign values to those parameters after carrying out the same process independtly.
This article helped me to understand it well.

What to do when only a portion of training/testing data generates confident predictions?

I have a general question on machine learning that can be applied to any algorithm. Suppose I have a particular problem, let us say soccer team winning/losing prediction. The features I choose are the amount of sleep each player gets before the game, sentiment analysis on news coverage, etc etc.
In this scenario, there is a pattern or correlation (something only a machine learning algorithm can pick up on) that only occurs around 5% of the time. But when it occurs, it is very predictive of the upcoming match.
How do you setup a machine learning algorithm to handle such a case in which it has the ability to discard most samples as noise. For example, consider a binary SVM. If there was a way to discard most of the “noisy” samples, a lot less overfitting would occur because the hyperplane would not have to eliminate error from these samples.
Regularization would help in this case, but due to the very low percentage of predictive information, is there a way we can code the algorithm to discard these samples in training and refuse to predict certain test data samples?
I have also read into confidence intervals but they seem more of an analytic tool to me than something to use in the algorithm.
I was thinking that using another ml algorithm which uses the same features to decide which testing samples are keepers might be a good idea.
Any answers using any machine learning algorithm (e.g. svm, neural net, random forest) as an example would be much appreciated. Any suggestions on where to look would be great as well (google is usually my friend, but not this time). Please let me know if I can rephrase the question better. Thanks.

Parallel Logistic Regression with GPU in Python (MinPy)

I just managed to set up an Anaconda Environment to run this example from MinPy.
Now as I understand it, the parallelization is in the training and predicting part.
However, for my specific use case I want to go one level higher:
Since I have one very large data set that is split up in fairly small ones I want to do many multinomial regressions for each of the subsets.
Is there a way I can parallelize at this high of a level?
Since I am very new to the topic I simply do not know how to approach this problem.
Thanks in advance for any advice :)

Dataset with only values (0,1,-1) with LSTM or CNN is giving 50% accuracy where as RF, SVM, ELM, Neural networks are giving above 90%

I have a dataset with 11k instances containing 0s,1s and -1s. I heard that deep learning can be applied to feature values.Hence applied the same for my dataset but surprisingly it resulted in less accuracy (<50%) compared to traditional machine learning algos (RF,SVM,ELM). Is it appropriate to apply deep learning algos to feature values for classification task? Any suggestion is greatly appreciated.
First of all, Deep Learning isn't a mythical hammer you can throw at every problem and expect better results. It requires careful analysis of your problem, choosing the right method, crafting your network, properly setting up your training, and only then, with a lot of luck will you see significantly better results than classical methods.
From what you describe (and without any more details about your implementation), it seems to me that there could have been several things going wrong:
Your task is simply not designed for a neural network. Some tasks are still better solved with classical methods, since they manually account for patterns in your data, or distill your advanced reasoning/knowledge into a prediction. You might not be directly aware of it, but sometimes neural networks are just overkill.
You don't describe how your 11000 instances are distributed with respect to the target classes, how big the input is, what kind of preprocessing you are performing for either method, etc, etc. Maybe your data is simply processed wrong, your training is diverging due to unfortunate parameter setups, or plenty of other things.
To expect a reasonable answer, you would have to share at least a bit of code regarding the implementation of your task, and parameters you are using for training.

Categories

Resources