After creating the clr_default:
clr_default = setup(df_rain_definitivo_one_drop_catboost_norm_fs_dropna,fold_shuffle=True, target='RainTomorrow', session_id=123)
I've tried to use the compare_models() function in Pycaret, using the following call:
best_model = compare_models()
from pycaret.classification import *
However I get the following error message:
ValueError Traceback (most recent call last)
<ipython-input-228-e1d76b68915a> in <module>()
----> 1 best_model = compare_models(n_select = 5, sort='Accuracy')
1 frames
/usr/local/lib/python3.7/dist-packages/pycaret/internal/tabular.py in compare_models(include, exclude, fold, round, cross_validation, sort, n_select, budget_time, turbo, errors, fit_kwargs, groups, verbose, display)
1954 if sort is None:
1955 raise ValueError(
-> 1956 f"Sort method not supported. See docstring for list of available parameters."
1957 )
1958
ValueError: Sort method not supported. See docstring for list of available parameters.
I've tried to call compare_models() with the sort parameter = 'Accuracy' but it didn't do any good.
Also, I'm on Google Colab
I dont get what is n_select = 5? do you want to get the top-5 models? Otherwise;
Im using your code examples:
First import pycaret
from pycaret.classification import *
Then setup,
clr_default = setup(df_rain_definitivo_one_drop_catboost_norm_fs_dropna,fold_shuffle=True, target='RainTomorrow', session_id=123)
Last use compare model method
best_model = compare_models(sort='Accuracy')
After that you can create your models then tune it.
Related
I am trying to train a ByteLevelBPETokenizer using an iterable instead of from files. There must be something I am doing wrong when I instantiate the trainer, but I can't tell what it is. When I try to train the tokenizer with my dataset (clothing data from Kaggle) + the BpeTrainer, I get an error.
**TypeError**: 'tokenizers.trainers.BpeTrainer' object cannot be interpreted as an integer
I am using Colab
Step 1: Install tokenizers & download the Kaggle data
!pip install tokenizers
# Download clothing data from Kaggle
# https://www.kaggle.com/nicapotato/womens-ecommerce-clothing-reviews/version/1?select=Womens+Clothing+E-Commerce+Reviews.csv
Step 2: Upload the file
# use colab file upload
from google.colab import files
uploaded = files.upload()
Step 3: Clean the data (remove floats) & run trainer
import io
import pandas as pd
# convert the csv to a dataframe so it can be parsed
data = io.BytesIO(uploaded['clothing_dataset.csv'])
df = pd.read_csv(data)
# convert the review text to a list so it can be passed as iterable to tokenizer
clothing_data = df['Review Text'].to_list()
# Remove float values from the data
clean_data = []
for item in clothing_data:
if type(item) != float:
clean_data.append(item)
from tokenizers import ByteLevelBPETokenizer
from tokenizers.processors import BertProcessing
from tokenizers import trainers, pre_tokenizers
from tokenizers.trainers import BpeTrainer
from pathlib import Path
# Initialize a tokenizer
tokenizer = ByteLevelBPETokenizer(lowercase=True)
# Intantiate BpeTrainer
trainer = BpeTrainer(
vocab_size=20000,
min_frequence = 2,
show_progress=True,
special_tokens=["<s>","<pad>","</s>","<unk>","<mask>"],)
# Train the tokenizer
tokenizer.train_from_iterator(clean_data, trainer)
Error - I can see that the trainer is a BpeTrainer Type.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-103-7738a7becb0e> in <module>()
34
35 # Train the tokenizer
---> 36 tokenizer.train_from_iterator(clean_data, trainer)
/usr/local/lib/python3.7/dist-packages/tokenizers/implementations/byte_level_bpe.py in train_from_iterator(self, iterator, vocab_size, min_frequency, show_progress, special_tokens)
119 show_progress=show_progress,
120 special_tokens=special_tokens,
--> 121 initial_alphabet=pre_tokenizers.ByteLevel.alphabet(),
122 )
123 self._tokenizer.train_from_iterator(iterator, trainer=trainer)
TypeError: 'tokenizers.trainers.BpeTrainer' object cannot be interpreted as an integer
Interesting Note: If I set the input trainer=trainer I get this
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-104-64737f948e6d> in <module>()
34
35 # Train the tokenizer
---> 36 tokenizer.train_from_iterator(clean_data, trainer=trainer)
TypeError: train_from_iterator() got an unexpected keyword argument 'trainer'
I haven't used train_from_iterator before but looking at the HF docs it seems you should use a generator function. So something like:
def clothing_generator():
for item in clothing_data:
if type(item) != float:
yield item
Followed by:
tokenizer.train_from_iterator(clothing_generator(), trainer)
Might help?
i am using the column transformer for the first time and I keep getting a Type Error, is something wrong with the code?
code :
transformer = ColumnTransformer(('cat',OrdinalEncoder(),['job_industry_category','job_title','wealth_segmenr','gender']),
('numb',MinMaxScaler(),['tenure', 'age'])
data_impute = transformer.fit_transform(data_impute)
error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-109-ca10f9ee762b> in <module>
----> 1 data_impute = transformer.fit_transform(data_impute)
~\anaconda3\lib\site-packages\sklearn\compose\_column_transformer.py in fit_transform(self, X, y)
512 self._feature_names_in = None
513 X = _check_X(X)
--> 514 self._validate_transformers()
515 self._validate_column_callables(X)
516 self._validate_remainder(X)
~\anaconda3\lib\site-packages\sklearn\compose\_column_transformer.py in _validate_transformers(self)
271 return
272
--> 273 names, transformers, _ = zip(*self.transformers)
274
275 # validate names
TypeError: zip argument #2 must support iteration
A wrong construction of the ColumnTransformer object results in passing wrong arguments to the internal zip call, resulting in a useless error message for the mere mortal.
constructor is
class sklearn.compose.ColumnTransformer(transformers, *, remainder='drop', sparse_threshold=0.3, n_jobs=None, transformer_weights=None, verbose=False)[source]ΒΆ
The wrong construction code passes transformers as ('cat',OrdinalEncoder(),'job_industry_category','job_title','wealth_segmenr','gender']) then remainder as ('numb',MinMaxScaler(),['tenure', 'age']), then it fails miserably...
from an example in the docs:
ct = ColumnTransformer(
[("norm1", Normalizer(norm='l1'), [0, 1]),
("norm2", Normalizer(norm='l1'), slice(2, 4))])
you must pass a list of tuples, not the tuples as arguments
fix:
transformer = ColumnTransformer([ # note the [ to start the list
('cat',OrdinalEncoder(),'job_industry_category','job_title','wealth_segmenr','gender']),
('numb',MinMaxScaler(),['tenure', 'age'])
]) # note the ] to end the list
The datascience modules generally lack strong type hinting or type checking. Read the docs + copy/adapt the examples instead of starting from a blank page!
working through a tutorial that is supposed to help students do the assignment, but I'm encountering a problem. I'm using python on a notebook project in IBM. Right now the section is simply data exploration. However this error is occurring and I'm not sure how to fix it, no one else seemed to have this problem in this class and the teacher is rather slow to help so I came here!
I tried just defining the variable before its called, but no dice either way.
All the code prior to this is just importing libraries and then parsing the data
# Infer the data type of each column and convert the data to the inferred data type
from ingest import *
eu = ExtensionUtils(sqlContext)
df_data_1 = eu.convertTypes(df_data_1)
df_data_1.printSchema()
the error I'm getting is
TypeError Traceback (most recent call last)
<ipython-input-14-33250ae79106> in <module>()
2 from ingest import *
3 eu = ExtensionUtils(sqlContext)
----> 4 df_data_1 = eu.convertTypes(df_data_1)
5 df_data_1.printSchema()
/opt/ibm/third-party/libs/python3/ingest/extension_utils.py in convertTypes(self, input_obj, dictVal)
304 """
305
--> 306 checkEnrichType_or_DataFrame("input_obj",input_obj)
307 self.logger = self._jLogger.getLogger(__name__)
308 methodname = str(inspect.stack()[0][3])
/opt/ibm/third-party/libs/python3/ingest/extension_utils.py in checkEnrichType_or_DataFrame(param, paramval)
81 if not isinstance(paramval,(EnrichType ,DataFrame)):
82 raise TypeError("%s should be a EnrichType class object or DataFrame, got type %s"
---> 83 % (str(param), type(paramval)))
84
85
TypeError: input_obj should be a EnrichType class object or DataFrame, got type <class 'NoneType'>
The solution was not with the code itself but rather with the notebook. A code snippet from a built in function needed to be inserted first before this.
I got error like this :
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-38-b9ac626e6121> in <module>
5
6 # Fitting TF-IDF to both training and test sets (semi-supervised learning)
----> 7 tfv.fit(list(xtrain) + list(xvalid))
8 xtrain_tfv = tfv.transform(xtrain)
9 xvalid_tfv = tfv.transform(xvalid)
TypeError: 'list' object is not callable
When I run these codes in python:
tfv = TfidfVectorizer(min_df=3, max_features=None,
strip_accents='unicode', analyzer='word',token_pattern=r'\w{1,}',
ngram_range=(1, 3), use_idf=1,smooth_idf=1,sublinear_tf=1,
stop_words = 'english')
# Fitting TF-IDF to both training and test sets (semi-supervised learning)
tfv.fit(list(xtrain) + list(xvalid))
xtrain_tfv = tfv.transform(xtrain)
xvalid_tfv = tfv.transform(xvalid)
P.S. I also tried to convert the xtrain to list with xtrain.tolist(), but it doesn't work for me either.
From the code you provided nothing seems wrong. However, I hypothesize that somewhere before that block of code, you assigned an object to the variable name list (most likely something along the lines of list = [...]) which is usually the cause of this error.
Try to find that line of code if it exists and rename that variable. Generally it is not a good idea to rename built-in types for this reason. For more info read this
I am trying to use a simple apply on s frame full of data. This is for a simple data transform on one of the columns applying a function that takes a text input and splits it into a list. Here is the function and its call/output:
In [1]: def count_words(txt):
count = Counter()
for word in txt.split():
count[word]+=1
return count
In [2]: products.apply(lambda x: count_words(x['review']))
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-8-85338326302c> in <module>()
----> 1 products.apply(lambda x: count_words(x['review']))
C:\Anaconda3\envs\dato-env\lib\site-packages\graphlab\data_structures\sframe.pyc in apply(self, fn, dtype, seed)
2607
2608 with cython_context():
-> 2609 return SArray(_proxy=self.__proxy__.transform(fn, dtype, seed))
2610
2611 def flat_map(self, column_names, fn, column_types='auto', seed=None):
C:\Anaconda3\envs\dato-env\lib\site-packages\graphlab\cython\context.pyc in __exit__(self, exc_type, exc_value, traceback)
47 if not self.show_cython_trace:
48 # To hide cython trace, we re-raise from here
---> 49 raise exc_type(exc_value)
50 else:
51 # To show the full trace, we do nothing and let exception propagate
RuntimeError: Runtime Exception. Unable to evaluate lambdas. Lambda workers did not start.
When I run my code I get that error. The s frame (df) is only 10 by 2 so there should be no overload coming from there. I don't know how to fix this issue.
If you're using GraphLab Create, there is actually a built-in tool for doing this, in the "text analytics" toolkit. Let's say I have data like:
import graphlab
products = graphlab.SFrame({'review': ['a portrait of the artist as a young man',
'the sound and the fury']})
The easiest way to count the words in each entry is
products['counts'] = graphlab.text_analytics.count_words(products['review'])
If you're using the sframe package by itself, or if you want to do a custom function like the one you described, I think the key missing piece in your code is that the Counter needs to be converted into a dictionary in order for the SFrame to handle the output.
from collections import Counter
def count_words(txt):
count = Counter()
for word in txt.split():
count[word] += 1
return dict(count)
products['counts'] = products.apply(lambda x: count_words(x['review']))
For anyone who has come across this issue while using graphlab here is the the discussion thread on the issue on dato support:
http://forum.dato.com/discussion/1499/graphlab-create-using-anaconda-ipython-notebook-lambda-workers-did-not-start
Here is the code that can be run to provide a case by case basis for this issue.
After starting ipython or ipython notebook in the Dato/Graphlab environment, but before importing graphlab, copy and run the following code
import ctypes, inspect, os, graphlab
from ctypes import wintypes
kernel32 = ctypes.WinDLL('kernel32', use_last_error=True)
kernel32.SetDllDirectoryW.argtypes = (wintypes.LPCWSTR,)
src_dir = os.path.split(inspect.getfile(graphlab))[0]
kernel32.SetDllDirectoryW(src_dir)
# Should work
graphlab.SArray(range(1000)).apply(lambda x: x)
If this is run, the the apply function should work fine with sframe.