Get accuracy metric from leaderboard function when using H20 AI - python

I am running a binary classification model using H2O autoML. I have explicitly told autoML to treat this as a classification model with the following line of code.
# This line of code turns our int variable into a factor.
# This is necessary to tell H2O that we want a classification model
feature_data['Radius'] = feature_data['Radius'].asfactor()
After running H20 autoML for a minute and then using the following line of code;
lb = aml.leaderboard
lb.head()
lb.head(rows=lb.nrows) # Entire leaderboard
I got the output in the screenshot below
As you can see, the metrics used for classification are AUC and logloss but what I want to see is accuracy. What should I add to get such an output?

It doesn't look like the leaderboard allows you to sort using accuracy as a metric. The following lines of code and text have been directly taken from the documentation:
aml = H2OAutoML(max_runtime_secs = 30, sort_metric = "logloss")
For binomial classification choose between AUC, "logloss", "mean_per_class_error", "RMSE", "MSE".

you can simply assign "accuracy" as a sorting metric while building h2o aml model like:
aml = H2OAutoML(max_runtime_secs = 30, sort_metric = "accuracy")
It will publish the model based on accuracy in ascending order from top to bottom.

Related

How to deal with Pycaret's adding extra features while modelling? (For reusing the model)

After importing Pycaret I called setup(mydf, 'mytarget') and run compare_models(). Then, I wanted to save a model from the comparison list and use it on another dataset. What I did was something like: lr = create_model('lr').
However, when I try lr.predict(mynewdfwithouttarget) I got the size mismatch error:
X has 11 features per sample; expecting 37
Other models in the list also output the same (or a similar) error.
So, what is the way to use the models that were trained inside compare_models()?
Thank you.
Create model:
lr = create_model('lr')
Predict on test / hold-out Sample:
predict_model(lr);
Finalize Model for Deployment:
final_lr = finalize_model(lr)
Predict on new data:
predictions = predict_model(final_lr, data = mynewdfwithouttarget)

How to replace sts.LinearRegression with non-linear model for a Tensorflow Probability Structured Time Series model component

I'm creating a time-series forecasting model with external, controllable features similar to the "Forecasting Demand for Electricity" example found at https://medium.com/tensorflow/structural-time-series-modeling-in-tensorflow-probability-344edac24083. In order to model the influence of the external factors, I am using an sts.LinearRegression() as a component of my model, but those external factors are very non-linear in nature and it's causing unwanted negative predictions in my model.
I've tried creating (simpler) forecasting outside of TFP STS, and found that a RandomForestRegressor works much better a LinearRegressor for these external features. What I'd LIKE to do is replace the sts.LinearRegression() with an sts.RandomForestRegressor(), but that isn't available from the sts library. In fact, there are hardly any options available from the sts library: https://www.tensorflow.org/probability/api_docs/python/tfp/sts/LinearRegression
I've also tried converting my target variable to log form, but there are numerous instances of zeros (which are inf for log), and this doesn't turn out to be a useful transformation.
My model architecture for TFP STS looks something like this:
def build_model(observed_time_series):
season_effect = sts.Seasonal(
num_seasons = 4, num_steps_per_season = 13, observed_time_series = observed_time_series,
name = 'season_effect')
marketing_effect = sts.LinearRegression(
design_matrix = tf.stack([recent_publicity - np.mean(recent_publicity),
active_ad - np.mean(active_ad)], axis = -1),
name = 'marketing_effect')
autoregressive = sts.Autoregressive(order=1,
observed_time_series = observed_time_series,
name = 'autoregressive')
model = sts.Sum([season_effect,
marketing_effect,
autoregressive],
observed_time_series = observed_time_series)
return model
Where I want to change the "marketing_effect" component of the model to something non-linear.
Is my only option here to clone the TFP STS library and create a custom function to handle non-linear data with something like a Random Forest Regressor? Does anyone know of a better option?
I'm not familiar with the usage of random forests in sts models. Can you point to a system where this exists? The trick with tfp.sts is that the math all works out nice & analytically because everything is marginally gaussian. If we can make that work, I think we're definitely open to bringing in other models.

How to used trained model to test data and plot graph?

I know this question is asked more than one time, but I couldn't understand codes or the logic behind.
In my data set, first I created a layer, sigmoid layer, then I connected this layer to the output layer and I've used softmax function in the output layer.
fl = tf.layers.dense(x, 10,activation=tf.sigmoid)
output = tf.layers.dense(fl, 2,activation=tf.nn.softmax)
I've created loss and accuracy, initialized variables, set optimizer and train variables, then I start running on my data:
loss = tf.losses.softmax_cross_entropy(onehot_labels=y,logits=output)
accuracy = tf.metrics.accuracy(tf.argmax(y_train,1),tf.argmax(output,1))
# inits
init_local = tf.local_variables_initializer()
init_global = tf.global_variables_initializer()
sess.run(init_global)
sess.run(init_local)
optimizer = tf.train.GradientDescentOptimizer(rate)
train = optimizer.minimize(loss)
for i in range(1000):
_, lv = sess.run((train, loss))
if i%5 == 0:
print("L: " + str(lv))
print("Accuracy: "+str(sess.run(accuracy)))
I can see that my loss value decreases every time I run on the training set. And my accuracy is ~0.93.
The problem is, from now on, I don't know how to test this model with real data.
Also, how can I draw a histogram of my real data? I have correct labels for my real data as well.
I will assume that you use Dataset to feed your training data and you want to run on test data immediately after training (since you don't have checkpoints in your code).
When using Dataset, you would create an iterator and call get_next() on it. Then, you would use the return values of get_next() as inputs to your model.
To run your model on the test data, you can use two high-level approaches:
If you test data has the same format as you train data, create a dataset that reads your test data. Then, create another copy (sometimes called a "tower") of your model (operations will be new but variables will be shared) that uses the test Dataset. Then, use sess.run() similarly to how you use it for training - you might not need to compute loss or train, but only accuracy.
If you test data has different format, you can feed it directly, by using feed_dict argument to sess.run(). You would feed your test data as values for tensors returned from get_next(). Usually, one feeds placeholders, but TensorFlow allows you to feed any tensor.
As for histograms, Tensorboard has a nice way of visualizing them: https://www.tensorflow.org/programmers_guide/tensorboard_histograms.

Different predictions on multiple run of the same algorithm scikit neural network

Since a MLP can implement any function. I have the following code, using which I am trying to implement the AND function. But what I find that on running the program multiple times, I end up getting different predicted values. Why is this happening ? Also how does one determine which type of activation function has to be provided at different layers ?
from sknn.mlp import Regressor,Layer,Classifier
import numpy as np
X_train = np.array([[0,0],[0,1],[1,0],[1,1]])
y_train = np.array([0,0,0,1])
nn = Classifier(layers=[Layer("Softmax", units=2),Layer("Linear", units=2)],learning_rate=0.001,n_iter=25)
nn.fit(X_train, y_train)
X_example = np.array([[0,0],[0,1],[1,0],[1,1]])
y_example = nn.predict(X_example)
print (y_example)
-The different values obtained for every run is because your weights are randomly initialized.
-Activation functions have different properties. You can either use your experience to decide which is best for your situation, or you can read how they work (https://stats.stackexchange.com/questions/115258/comprehensive-list-of-activation-functions-in-neural-networks-with-pros-cons)

How to do supervised deepbelief training in PyBrain?

I have trouble getting the DeepBeliefTrainer to work on my data in PyBrain/Python. Since I can't find any examples other than unsupervised on how to use the deep learning in PyBrain, I hope that someone can give examples that would show a basic concept of usage.
I have tried to initialize using:
epochs = 100
layerDims = [768,100,100,1]
net = buildNetwork(*layerDims)
dataset = self.dataset
trainer = DeepBeliefTrainer(net, dataset=dataSet)
trainer.trainEpochs(epochs)
I try to use a SupervisedDataset for regression, but the training just fails. Have anyone succeded with using deeplearning trainer for supervised machine learning? And how did you do it?
Error I get:
File "/Library/Python/2.7/site-packages/PyBrain-0.3.1-py2.7.egg/pybrain/structure/networks/rbm.py", line 39, in __init__
self.con = self.net.connections[self.visible][0]
KeyError: None
It's because your initial network:
net = buildNetwork(*layerDims) doesn't have a layer with the name of the visible layer in your deep belief network, which is 'visible'. So, in order to find it mapped in the initial network, you can do something like:
net.addInputModule(LinearLayer(input_dim, 'visible'))
[...]
trainer = DeepBeliefTrainer(net, dataset=dataSet)

Categories

Resources