LGBMClassifier create_tree_diagraph not showing in notebook - python

I am using LightGBM's LGBMClassifier for a binary classification problem and want to print out the actual diagram.
Here is how I trained/fit the model
clf = lgb.LGBMClassifier()
clf.fit(x_train, y_train, categorical_feature = x_train.select_dtypes(include = 'category').columns.tolist())
And here is how I am trying to print the diagram
lgb.create_tree_digraph(clf, orientation='vertical')
However, the only output I am getting is
<graphviz.graphs.Digraph at 0x7f6a10a5ed00>
I also tried using the parent lightgbm.train() method to build the model and as the booster argument in create_tree_diagraph, however I am getting similar output.
Is there an additional library or function I have to call to print out the tree, or is there another way to perhaps save it to a .png file?
I am using a Python Notebook in Databricks.

The REPL explained to you that you have a Digraph in hand.
Good. Let's assign it to a temp var, and render it.
https://graphviz.readthedocs.io/en/stable/manual.html#basic-usage
g = lgb.create_tree_digraph( ... )
g.render(view=True)
g.format = "png"
g.render("output.png")
g.view()
This relies on dot being already installed
so it works when bash invokes it.

Related

How to get Lime predictions vs Actual predictions in a dataframe?

I am working on a binary classification problem using Random forest and using LIME explainer to explain the predictions.
I used the below code to generate LIME explanations
import lime
import lime.lime_tabular
explainer = lime.lime_tabular.LimeTabularExplainer(ord_train_t.values, discretize_continuous=True,
feature_names=feat_names,
mode="classification",
feature_selection = "lasso_path",
class_names=rf_boruta.classes_,
categorical_names=output,
kernel_width=10, verbose=True)
i = 969
exp = explainer.explain_instance(ord_test_t.iloc[1,:],rf_boruta.predict_proba,distance_metric = 'euclidean',num_features=5)
I got an output like below
Intercept 0.29625037124439896
Prediction_local [0.46168824]
Right:0.6911888737552843
However, the above is printed as a message in screen
How can we get this info in a dataframe?
Lime doesn't have direct export-to-dataframe capabilities, so the way to go appears to be appending the predictions to a list and then transforming it into a Dataframe.
Yes, depending on how many predictions you have, this may take a lot of time, since the model has to predict every instance individually.
This is an example I found, the explain_instance needs to be adjusted to your model args, but follows the same logic.
l=[]
for n in range(0,X_test.shape[0]+1):
exp = explainer.explain_instance(X_test.values[n], clf.predict_proba, num_features=10)
a=exp.as_list()
l.append(a)
df = pd.DataFrame(l)
If you need more than what the as_list() provides, the explainer has more data on it. I ran an example to see what else explain instance would retrieve.
You can, instead of just using as_list(), append to this as_list the other values you need.
a = exp.to_list()
a.append(exp.intercept[1])
l.append(a)
Using this approach you can get the intercept and the prediction_local, for the right value I don't really know which one it would be, but I am certain the object explainer has it somewhere with another name.
Use a breakpoint on your code and explore the explainer, maybe there are other info you would want to save as well.
Lime Github: Issue ref 213
To see intercept and prediction_local of your explainer you can do explainer.intercept and explainer.local_pred. See this blog for details.

No console output using Keras model.fit() function

I'm following this tutorial to perform time series classifications using Transformers with Keras and TensorFlow. I'm using Windows 10 and the PyDev Eclipse plugin. Unfortunately, my program stops and the console output is completely blank every time I run the following code:
n_classes = len(np.unique(y_train))
input_shape = np.array(x_trainScaled).shape[0:]
model = build_model(n_classes,input_shape,head_size=256,num_heads=4,ff_dim=4,num_transformer_blocks=4,mlp_units=[128],mlp_dropout=0.4,dropout=0.25)
model.compile(loss="sparse_categorical_crossentropy",optimizer=keras.optimizers.Adam(learning_rate=1e-4),metrics=["sparse_categorical_accuracy"])
print(model.summary())
callbacks = [keras.callbacks.EarlyStopping(patience=100, restore_best_weights=True)]
model.fit(x_trainScaled,y_train,validation_split=0.2,epochs=200,batch_size=64,callbacks=callbacks)
pathToModel = 'my/path/to/model/'
model.save(pathToModel)
Even previous warnings or print statements are completely erased and I have no idea what's going on. If I comment the model.fit(...) statement out, the program terminates and crashes with an error message resulting from a model.predict(...) call.
Any help is highly appreciated.
The solution was to transform the input data and labels to numpy arrays first. Thus, calling the fit function as follows:
model.fit(np.array(x_trainScaled),np.array(y_train),validation_split=0.2,epochs=200,batch_size=64,callbacks=callbacks)
worked perfectly fine for me, as opposed to:
model.fit(x_trainScaled,y_train,validation_split=0.2,epochs=200,batch_size=64,callbacks=callbacks)

Python Chefboost feature importance No file found like outputs/rules/rules_fi.csv

I am using Chefboost to build Chaid decision tree and want to check the feature importance. For some reason, I got this error:
cb.feature_importance()
Feature importance calculation is enabled when parallelised fitting. It seems that fit function didn't called parallelised. No file found like outputs/rules/rules_fi.csv
This is my code:
from chefboost import Chefboost as cb
X_train['Decision']=y_train
config={'algorithm': 'CHAID','enableParallelism': enableParallelism}
cb.fit(X_train,config)
cb.feature_importance()
Can anybody help me with this?
Thanks.
The issue was solved by using the code below.
from chefboost import Chefboost as cb
X_train['Decision']=y_train
config={'algorithm': 'CHAID','enableParallelism': True}
cb.fit(X_train,config)
cb.feature_importance()
You don't have to run parallel anymore. Feature importance function expects the exact path of rules.py. Be sure to upgrada your chefboost library first.
config={'algorithm': 'CHAID'}
model = cb.fit(X_train, config)
#get decision rules
#decision rules = "outputs/rules/rules.py" #static way
decision_rules = model["trees"][0].__dict__["__spec__"].origin #dynamic way
cb.feature_importance(decision_rules)

AttributeError: 'XGBClassifier' object has no attribute 'save_raw'

I keep getting the error mentioned in the title whenever I try to compile my file. I'm basically using this file https://github.com/dmlc/xgboost/blob/master/python-package/xgboost/training.py
and the error happens on line 38 at save_raw
I've tried reinstalling different versions of xgboost with both pip and git clone, nothing seems to work. Can someone help me?
I am using the latest version of scikit, python and xgboost.
if xgb_model is not None:
if not isinstance(xgb_model, STRING_TYPES):
xgb_model = xgb_model.save_raw() //Error here
bst = Booster(params, [dtrain] + [d[0] for d in evals], model_file=xgb_model)
nboost = len(bst.get_dump())
I have experienced with save in **XGBRegressor**
I think it is same with **XGBClassifier**.
I can working with **save_model** and **load_model** but some objects will not be saved or loaded.
def load_model(self, fname):
"""
Load the model from a file.
The model is loaded from an XGBoost internal binary format which is universal among the various XGBoost interfaces. Auxiliary attributes of the Python Booster object (such as feature names) will not be loaded.
Label encodings (text labels to numeric labels) will be also lost.
**If you are using only the Python interface, we recommend pickling the model object for best results.**
So another solutions is considered
with me, pickle package works well
import pickle
pickle.dump(model, open("boston_earlyStopping.dat", "wb"))
new_model = pickle.load(open("boston_earlyStopping.dat", "rb"))
new_model.best_ntree_limit
99

How to read Scikit-Learn source code?

I am learning to use scikit-learn to build a decision tree. However, when I go with the example code. I found the kernel code of the tree building is empty.
I am using the following code:
from sklearn import tree
X = [[0, 0], [1, 1]]
Y = [0, 1]
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X, Y)
I go to fit() method to see the details of the code. And I think the most important code for implementing decision tree is the following code at line 362 of the tree.py.
builder.build(self.tree_, X, y, sample_weight, X_idx_sorted)
However, when I go into the build method in _tree.py, I found that all method is empty which only contains 'pass' keyword, such as:
""" Build a decision tree in depth-first fashion. """
def build(self, *args, **kwargs): # real signature unknown
""" Build a decision tree from the training set (X, y). """
pass
I am wondering about the strange code. I have no idea to figure it out. Am I wrong about the source code? How could this code run?
I am using PyCharm as my IDE and using Anaconda3 as my environment....It was so strange
Some of the libraries in sklearn are compiled with cython.
And you can't find the source code in your folder.
They are placed in your folder as a form of .pyd and it is impossible to read this.
The .pyd files are only imported from the other .py files like library.
You can find the original source code in sklern git repository as a form of .pyx. (file name is same)
The cython syntax is a little different from python syntax, especially in defining variables.
If you want to change the code, you should compile .pyx to .pyd.

Categories

Resources