how can I train my model using python gensim

how can I train my model using python gensim - python

I am trying to train my model and when I write these codes :
for epoch in range(max_epochs):
model.train(tagged_data,
total_examples=model.corpus_count,
epochs=model.iter)
and the error that I am getting is the following
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-7-6ecb8a2d0ac7> in <module>
2 model.train(tagged_data,
3 total_examples=model.corpus_count,
----> 4 epochs=model.iter)
AttributeError: 'Doc2Vec' object has no attribute 'iter'

You're likely copying some outdated example code. For example:
recent versions of Gensim don't have an .iter property on the Doc2Vec model
it's almost always a bad idea to be calling train() multiple times in your own epochs loop - especially as a beginner just trying to get things working
So: don't copy whatever source you're copying. It's not only out-of-date, it's suggesting something (the train() calls in a loop) that was never a great idea.
Instead, base your work on better examples, like the intro tutorial in the Gensim docs:
https://radimrehurek.com/gensim/auto_examples/tutorials/run_doc2vec_lee.html

To resolve the problem, change model.iter to model.epochs. For example:
for epoch in range(max_epochs):
model.train(tagged_data,
total_examples=model.corpus_count,
epochs=model.epochs)

Related

Get OOB score within a pipeline for Random Forest

I was wondering for a machine learning project: is it possible to implement RandomForestRegressor inside a pipeline?
Specifically, I need to determine the OOB score from a RandomForestRegressor. But my data requires a lot of preprocessing.
I tried several things, and this is the closest so far:
# Creation of the pipeline
rand_piped = Pipeline([
('preprocessor', preprocessor),
('model', RandomForestRegressor(max_depth=3, random_state=0, oob_score=True))
])
# Fitting our model
rand_piped.fit(df_X_train,df_Y_train.values.ravel())
# Getting our metrics and predictions
oob_score = rand_piped.oob_score_
At the moment I think my problem is that I still have an unclear idea of this method. So feel free to correct me. It returns this error:
Traceback (most recent call last):
File "/home/user/my_rf.py", line 15, in <module>
oob_score = rand_piped.oob_score_
AttributeError: 'Pipeline' object has no attribute 'oob_score_'

Pipelines are subscriptable, so you can look up the oob_score_ in the model step:
>>> rand_piped["model"].oob_score_
0.9297212997034854

how to use XLMRoberta in fine-tuning ,

There are two problems i met when i fine-tuning my code.
And i was trying to use X_1 and X_2 to regress.
There are different languages in the corpus.
HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/xlm-roberta-base/resolve/main/tf_model.h5
During handling of the above exception, another exception occurred:
OSError Traceback (most recent call last)
/tmp/ipykernel_33/2123064688.py in <module>
55 # )
56
---> 57 model = TFXLMRobertaForSequenceClassification.from_pretrained('xlm-roberta-base',num_labels=1)
OSError: Can't load weights for 'xlm-roberta-base'. Make sure that:
- 'xlm-roberta-base' is a correct model identifier listed on 'https://huggingface.co/models'
(make sure 'xlm-roberta-base' is not a path to a local directory with something else, in that case)
- or 'xlm-roberta-base' is the correct path to a directory containing a file named one of tf_model.h5, pytorch_model.bin.
This is my code:
tokenizer = XLMRobertaTokenizerFast.from_pretrained('xlm-roberta-base')
train_encoding = tokenizer(X_train_1,X_train_2,truncation=True,padding=True)
val_encoding = tokenizer(X_val_1,X_val_2,truncation=True,padding=True)
train_dataset = tf.data.Dataset.from_tensor_slices(
(dict(train_encoding),y_train)
)
val_dataset = tf.data.Dataset.from_tensor_slices(
(dict(val_encoding),y_val)
)
model = TFXLMRobertaForSequenceClassification.from_pretrained('xlm-roberta-base',num_labels=1)

There are several things you're better to know before diving deep into huggingface transformers.
The preferred library for working with huggingface's transformers is PyTorch.
For several widely used models, you may find the Tensorflow version alongside but not for all.
fortunately, there are ways to convert pt checkpoints to tf and vise versa.
Finally how to fix the code:
# switching to pytorch
tokenizer = XLMRobertaTokenizerFast.from_pretrained('xlm-roberta-base')
model = XLMRobertaForSequenceClassification.from_pretrained('xlm-roberta-base')
# using un-official checkpoints
model = TFXLMRobertaForSequenceClassification.from_pretrained('jplu/tf-xlm-roberta-base',num_labels=1)
# converting pt checkpoint to tensorflow (not recommended!)

Tensorflow Load model to check what's in it

I am not very familiar w/ tensorflow, and received ".pb" file and was trying to see how they try to approach the problem.
model_path = os.path.join(saved_path,"model",str(k+1))
model = tf.saved_model.load(model_path)
print(model)
<tensorflow.python.saved_model.load.Loader._recreate_base_user_object.._UserObject object at 0x7f6d200ec748>
model.summary()
AttributeError Traceback (most recent call last)
in
----> 1 model.summary()
AttributeError: '_UserObject' object has no attribute 'summary'
is there any way I can check the summary of the model?
Because I was wondering their job to approach the issue with segmentation task and seems like did object detection. That is why I want check what is inside the model.pb file!
Thank you.

Tensorflow estimator error in google colab

I am training a DNN in tensorflow in a google colab environment, the code works well till yesterday, but now when I run the the estimator training section of my code, It gives an error.
I don't know exactly what is the reason, is google colab using any updated version of tensorflow, in which some functions are not compatible with older versions? because I had no problem with the code before, and I didn't change it.
It seems this problem exist for the other codes, for example this sample code form stanford was ran without any error before,
https://colab.research.google.com/drive/1nG7Ga46jrWF5n7pHe0FK6anB0pLNgBVt
but now when you run the section :
estimator.train(input_fn=train_input_fn, steps=1000);
It gives the same error as mine:
> **TypeError Traceback (most recent call last)
> /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/tensor_util.py
> in make_tensor_proto(values, dtype, shape, verify_shape)**
>
> **TypeError: Expected binary or unicode string, got {'sent_symbol': <tf.Tensor 'random_shuffle_queue_DequeueMany:3' shape=(128,)
> dtype=int64>}**
>
> **TypeError Traceback (most recent call last) <ipython-input-10-9dfe23a4bf62> in <module>()
> ----> 1 estimator.train(input_fn=train_input_fn, steps=1000);**
>
> **TypeError: Failed to convert object of type <class 'dict'> to Tensor. Contents: {'sent_symbol': <tf.Tensor
> 'random_shuffle_queue_DequeueMany:3' shape=(128,) dtype=int64>}.
> Consider casting elements to a supported type.**

The y attribute of the method tf.estimator.inputs.pandas_input_fn receives an input a Pandas Series object.
To extract the target 'sent_symbol' from the DataFrame, call training_labels['sent_symbol'].
To fix this script, modify the code as follows:
# Training input on the whole training set with no limit on training epochs.
train_input_fn = tf.estimator.inputs.pandas_input_fn(
training_examples, training_labels['sent_symbol'], num_epochs=None, shuffle=True)
# Prediction on the whole training set.
predict_train_input_fn = tf.estimator.inputs.pandas_input_fn(
training_examples, training_labels['sent_symbol'], shuffle=False)
# Prediction on the test set.
predict_test_input_fn = tf.estimator.inputs.pandas_input_fn(
validation_examples, validation_labels['sent_symbol'], shuffle=False)

Does xgboost have feature_importances_?

I'm calling xgboost via its scikit-learn-style Python interface:
model = xgboost.XGBRegressor()
%time model.fit(trainX, trainY)
testY = model.predict(testX)
Some sklearn models tell you which importance they assign to features via the attribute feature_importances. This doesn't seem to exist for the XGBRegressor:
model.feature_importances_
AttributeError Traceback (most recent call last)
<ipython-input-36-fbaa36f9f167> in <module>()
----> 1 model.feature_importances_
AttributeError: 'XGBRegressor' object has no attribute 'feature_importances_'
The weird thing is: For a collaborator of mine the attribute feature_importances_ is there! What could be the issue?
These are the versions I have:
In [2]: xgboost.__version__
Out[2]: '0.6'
In [4]: sklearn.__version__
Out[4]: '0.18.1'
... and the xgboost C++ library from github, commit ef8d92fc52c674c44b824949388e72175f72e4d1.

How did you install xgboost? Did you build the package after cloning it from github, as described in the doc?
http://xgboost.readthedocs.io/en/latest/build.html
As in this answer:
Feature Importance with XGBClassifier
There always seems to be a problem with the pip-installation and xgboost. Building and installing it from your build seems to help.

This worked for me:
model.get_booster().get_score(importance_type='weight')
hope it helps

This is useful for you,maybe.
xgb.plot_importance(bst)
And this is the link:plot

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

how can I train my model using python gensim - python

To resolve the problem, change model.iter to model.epochs. For example: for epoch in range(max_epochs): model.train(tagged_data, total_examples=model.corpus_count, epochs=model.epochs)

Related

Get OOB score within a pipeline for Random Forest

how to use XLMRoberta in fine-tuning ,

Tensorflow Load model to check what's in it

Tensorflow estimator error in google colab

Does xgboost have feature_importances_?

Categories

Resources