Does pack-pad sacrifices accuracy? - python

I tried pack-pad technique. It does reduce many computing time a lot, but my accuracy is significantly worse than former one.
It sounds make sense with approximation theory that keep longer computing time you will get more fine value. But this time it is neural network. I don't think training by embedded zero vector will get me more accuracy. Please correct me if I am wrong.
Here is my files. If you would like to see my code.
This is plain one with 50% accuracy
pack-pad with 25% accuracy
It is not a homework assignment. It is my self-study.
Questions:
1. Am I get a correct result?
2. Does pack-pad sacrifices accuracy?
PS:
I feel my question would fit to stackoverflow most than Datascience or CodeReview. Please let me move it if need.

Related

"reg_alpha" parameter in XGBoost regressor. Is it bad to use high values?

I'm performing hyperparameter tuning with grid search and I realized that I was getting overfitting... I tried a lot of ways to reduce it, changing the "gamma", "subsample", "max_depth" parameters to reduce it, but I was still overfitting...
Then, I increased the "reg_alpha" parameters value to > 30....and them my model reduced overfitting drastically. I know that this parameter refers to L1 regularization term on weights, and maybe that's why solved my problem.
I just want to know if it has any problem using high values for reg_alpha like this?
I would appreciate your help :D
reg_alpha penalizes the features which increase cost function. Meaning it finds the features that doesn't increase accuracy. But this makes prediction line smoother.
On some problems I also increase reg_alpha > 30 because it reduces both overfitting and test error.
But if it is a regression problem it's prediction will be close to mean on test set and it will maybe not catch anomalies good.
So i may say you can increase it as long as your test accuracy doesn't begin to fall down.
Lastly when increasing reg_alpha , keeping max_depth small might be a good practise.

What to do when only a portion of training/testing data generates confident predictions?

I have a general question on machine learning that can be applied to any algorithm. Suppose I have a particular problem, let us say soccer team winning/losing prediction. The features I choose are the amount of sleep each player gets before the game, sentiment analysis on news coverage, etc etc.
In this scenario, there is a pattern or correlation (something only a machine learning algorithm can pick up on) that only occurs around 5% of the time. But when it occurs, it is very predictive of the upcoming match.
How do you setup a machine learning algorithm to handle such a case in which it has the ability to discard most samples as noise. For example, consider a binary SVM. If there was a way to discard most of the “noisy” samples, a lot less overfitting would occur because the hyperplane would not have to eliminate error from these samples.
Regularization would help in this case, but due to the very low percentage of predictive information, is there a way we can code the algorithm to discard these samples in training and refuse to predict certain test data samples?
I have also read into confidence intervals but they seem more of an analytic tool to me than something to use in the algorithm.
I was thinking that using another ml algorithm which uses the same features to decide which testing samples are keepers might be a good idea.
Any answers using any machine learning algorithm (e.g. svm, neural net, random forest) as an example would be much appreciated. Any suggestions on where to look would be great as well (google is usually my friend, but not this time). Please let me know if I can rephrase the question better. Thanks.

How do I better process my data and set parameters for my Neural Network?

When I run my NN the only way to get any training to occur is if I divide X by 1000. The network also needs to be trained under 70000 times with a 0.03 training rate and if those values are larger the NN gets worse. I think this is a due to bad processing of data and maybe the lack of having biases, but I don't really know.
Code on Google Colab
In short: all of the problems you mentioned and more.
Scaling is essential, typically to 0 mean and a variance of 1. Otherwise, you will quickly saturate the hidden units, their gradients will be near zero and (almost) no learning will be possible.
Bias is mandatory for such ANN. It's like an offset for fitting linear function. If you drop it, getting good fit will be very difficult.
You seem to be checking accuracy on your training data.
You have very few training samples.
Sigmoid is proven to be poor choice. Use ReLU and check e.g. here for explanation.
Also, I'd recommend spending some time on learning Python before going into this. For starter, avoid using global, it can get you unforeseen behaviour if you're not careful.

Hyper-parameter Optimisation in Cpp?

I need to fit a deep neural network to data coming from a data generating process, think of an AR(5). So I have five features per observation and one y for some large number N observations in each simulation. I am interested only in the root mean squared error of the best performing DNN in each simulation.
Since it's a simulation setting, I have to do a large number of these simulations and within each simulation fit a neural network to the data. The only reasonable way I can think of doing this is fit the DNN via hyper-parameter optimisation given each simulation (dlib's find_min_global will be my optimiser).
Does it make sense to do this exercise in C++ (slow development because I am not proficient) or Python (faster iteration because I am fairly proficient).
From where I am sitting, C++ or Python might not make much of a difference in execution time, because the model has to be compiled each time the optimiser proposes a new hyper-parameter vector (am I wrong here?).
If it is possible to compile once, and test all hyper-parameters between the lower and upper bounds, then C++ would be my go to solution(Is this possible in any of the open source DNN languages?).
If anyone has done this exercise before, please advice.
Thank you all for your help.
See looking at your problem, one way to implement this is to use genetic/evolutionary algorithm. Considering that I understood your problem correctly, you want to sweep through all the hyper-parameters to get the get the best solution.
So, I would recommend using python for this and tensorflow, keras all support this. So this might not be a problem.
Note - If I understood your question differently, then please feel free to correct me.

How to speed up up Stochastic Gradient Descent?

I'm trying to fit a regression model with an L1 penalty, but I'm having trouble finding an implementation in python that fits in a reasonable amount of time. The data I've got is on the order of 100k by 500 (sidenote; several of the variables are pretty correlated), but running the sklearn Lasso implementation on this takes upwards of 12 hours to fit a single model (I'm not actually sure of the exact time, I've left it running overnight several times and it never finished).
I've been looking into Stochastic Gradient Descent as a way to get the job done faster. However, the SGDRegressor implementation in sklearn takes on the order of 8 hours to fit when I'm using 1e5 iterations. This seems like a relatively small amount (and the docs even suggest that the model often takes around 1e6 iters to converge).
I'm wondering if there's something that I'm being stupid about which is causing the fits to take a really long time. I've been told that SGD is often used for its efficiency (something around O(n_iter * n_samp * n_feat), though so far I haven't seen much improvement over Lasso.
To speed things up, I have tried:
Decreasing n_iter, but this often leads to a pretty bad solution because it hasn't converged yet.
Increasing the step size (and decreasing n_iter), but this often makes the loss function explode
Changing the learning rate types (from inverse scaling to an amount based off of the number of iterations), and this also didn't seem to make a huge difference.
Any suggestions for speeding this process up? It seems like partial_fit might be part of the answer, though the docs on this are somewhat sparse. I'd love to be able to fit these models without waiting for three days apiece.
Partial_fit is not the answer. It will not speed anything up. If anything, it would make it slower.
The implementation is pretty efficient, and I am surprised that you say convergence is slow. You do way to many iterations, I think. Have you looked at how the objective decreases?
Often tuning the initial learning rate can give speedups. Your dataset really shouldn't be a problem. I'm not sure if SGDRegressor does that internally, but rescaling your target to unit variance might help.
You could try vopal wabbit, which is an even faster implementation, but it shouldn't be necessary.

Categories

Resources