Cannot assign class_weight to RandomForestClassifier in Scikit Learn

Cannot assign class_weight to RandomForestClassifier in Scikit Learn - python

i just started some time ago to use the scikit learn package to implement Random Forests on my data set. I am trying to make a model based on multiple classes, and tried to implement the RandomForestClassifier. However, i think i have some imbalance and i want to use the class_weight="auto" parameter:
RFC = RandomForestClassifier(n_estimators = int(trees),class_weight="auto").fit(X_train, y_train)
However, when i try to run it, i get
__init__() got an unexpected keyword argument 'class_weight'
I tried checking at other questions, since i thought i didn't use the correct notation, but they all seem to reference class_weight="auto" in that way.
Note: The RF works without the class_weight parameter. I just want to try to improve my results because i think the data is imbalanced.
Thanks (if i did something wrong with formatting or the question, i will edit it, first question here)

I made the mistake of checking the wrong version list. I run in ipython, and while i did update it on the server, it didn't go through in the ipython enviroment, and when i checked it with conda, it was all the times without the ipython enviroment on.
I updated it and it worked, thanks.
Sorry, but thanks for looking into it.

Related

keras normalization methods

Are the below two lines basically the same thing?
tf.keras.layers.experimental.preprocessing.Normalization()
tf.keras.layers.Normalization()
I am trying to normalize(standardize in this case) the inputs for
fitting the neural network model using Tensorflow. After googling I found the two choices above. They seem to be the same thing but I'm not so sure. If they aren't the same, could anyone tell me the exact difference?

They are technically the same thing. But you should use this one
tf.keras.layers.Normalization()
Because this one is not available anymore.
tf.keras.layers.experimental.preprocessing.Normalization()

get probability from xgb.train()

I am new to Python and Machine learning. I have searched internet regarding my question and tried the solution people have suggested, but still not get it. Would really appreciate it if anyone can help me out.
I am working on my first XGboost model. I have tuned the parameters by using xgb.XGBClassifier, and then would like to enforce monotonicity on model variables. Seemingly I have to use xgb.train() to enforce monotonicity as shown in my code below.
xgb.train() can do predict(), but NOT predict_proba() function. So how can I get probability from xgb.train() ?
I have tried to use 'objective':'multi:softprob' instead of 'objective':'binary:logistic'. then score = bst_constr.predict(dtrain). But the score does not seem right to me.
Thank you so much.
params_constr={
'base_score':0.5,
'learning_rate':0.1,
'max_depth':5,
'min_child_weight':100,
'n_estimators':200,
'nthread':-1,
'objective':'binary:logistic',
'seed':2018,
'eval_metric':'auc'
}
params_constr['monotone_constraints'] = "(1,1,0,1,-1,-1,0,0,1,-1,1,0,1,0,-1,0,0,0,0,0,0,0,0,0,0,0,0,0,)"
dtrain = xgb.DMatrix(X_train, label = y_train)
bst_constr = xgb.train(params_constr, dtrain)
X_test['score']=bst_constr.predict_proba(X_test)[:,1]
AttributeError: 'Booster' object has no attribute 'predict_proba'

So based on my understanding, you are trying to obtain the probability for each class in the prediction phase. Two options.
It seems that you are using the XGBoost native api. Then just select the 'objective':'multi:softprob' as the parameter, and use the bst_constr.predict instead of bst_constr.predict_proba.
XGBoost also provides the scikit-learn api. But then you should initiate the model with bst_constr = xgb.XGBClassifier(**params_constr), and use bst_constr.fit() for training. Then you can call the bst_constr.predict_proba to obtain what you want. You can refer here for more details Scikit-Learn API in XGBoost.

How to add new cost function to neural network module?

Several days ago, I used the sklearn multilayer perceptron module for predictions.
Now I try to change the cost function while I using the neural network method which may make the prediction results more accurate. I have added the new cost function to ‘_base.py’ and also I have changed some codes in ‘multilayer_perceptron.py’. However, when I try to call the package and the module, there is a ‘no module named...’ problem. I tried several methods to solve this problem, like check the ‘init.py’ file and check the ‘PYTHONPATH’, but these methods don’t work.
So, could you please give me some guide on how to change the cost function? I would appreciate that and thank you so much.

Do variables need to be initialized in a session in tflearn?

Maybe this is a stupid question, but I switched from basic TensorFlow recently to tflearn and while I knew little of TensorFlow, I know even less of tflearn as I have just begun to experiment with it. I was able to create a network, train it, and generate a model that achieved a satisfactory metric. I did this all without using a TensorFlow session because a) none of the documentation I was looking at necessarily suggested it and b) I didn't even think to use it.
However, I would like to predict a value for a single input (the model performs regression on images, so I'm trying to get a value for a single image) and now I'm getting an error that the convolutional layers need to be initialized (Specifically "FailedPreconditionError: Attempting to use uninitialized value Conv2D/W").
The only thing I've added, though, are two lines:
model = Evaluator(network)
model.predict(feed_dict={input_placeholder: image_data})
I'm asking this as a general question because my actual code is a bit troublesome to just post here because admittedly I've been very sloppy in writing it. I will mention, however, that even if I start a session and initialize all variables before that second line, then run the line in the session, I get the same error.
Succinctly put, does tflearn require a session if I've not used TensorFlow stuff directly anywhere in my code? If so, does the model need to be trained in the session? And if not, what about those two lines would cause such an error?
I'm hoping it isn't necessary for more code to be posted, but if this isn't a general issue and is actually specific to my code then I can try to format it to be understandable here and then edit the post.

Random Forest not working in opencv python (cv2)

I can't seem to correctly pass in the parameters to train a Random Forest classifier in opencv from python.
I wrote an implementation in C++ which worked correctly, but do not get the same results in python.
I found some sample code here: http://fossies.org/linux/misc/opencv-2.4.7.tar.gz:a/opencv-2.4.7/samples/python2/letter_recog.py
which seems to indicate that you should pass in the parameters in a dict. Here is the code I am using:
rtree_params = dict(max_depth=11, min_sample_count=5, use_surrogates=False, max_categories=15, calc_var_importance=False, n_active_vars=0, max_num_of_trees_in_the_forest=1000, termcrit_type=cv2.TERM_CRITERIA_MAX_ITER)
classifier = cv2.RTrees()
classifier.train(train_data, cv2.CV_ROW_SAMPLE, label_data, params=rtree_params);
I can tell that the classifier is getting trained correctly, but it is not nearly as accurate as the one I trained with the same parameters in C++. I'm fairly certain that the parameters are getting acknowledged, because I get different results when I tweak the values.
I did notice that when I output the classifier to a file, it only has one tree. I'm pretty sure this is the problem. I looked at the openCV implementation:
http://www.code.opencv.org/svn/gsoc2012/denoising/trunk/opencv-2.4.2/modules/ml/src/rtrees.cpp
Given my parameters, it should output a forest with 1000 trees. I tried setting the max_num_of_trees_in_the_forest arguments to all sorts of crazy values, and it didn't change OpenCV's behaviour.
Thoughts?

Not sure if this will help much, but I believe:
n_active_vars=0
should be
nactive_vars=0
Also, you may wish to try experimenting with the term_crit parameter.
For example, try adding:
term_crit=(cv2.TERM_CRITERIA_MAX_ITER,1000,1)
into your dictionary.
I believe this will set the criteria to terminate when 1000 trees are added into the forest.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Cannot assign class_weight to RandomForestClassifier in Scikit Learn - python

Related

keras normalization methods

get probability from xgb.train()

How to add new cost function to neural network module?

Do variables need to be initialized in a session in tflearn?

Random Forest not working in opencv python (cv2)

Categories

Resources