I am using TimeseriesGenerator for my problem.
The shapes for my train and test data are:
x_train - (306720, 20)
x_test - (306720,)
y_train - (4321, 20)
y_test - (4321,)
And their dtype is float64. And I dont need to use to.numpy() anymore.
I then use TimeSeriesGenerator
train_data = TimeseriesGenerator(x_train, x_test, length=144, batch_size=100)
test_data = TimeseriesGenerator(y_train, y_test, length=144, batch_size=100)
When I try to run
GRU = keras.models.Sequential([keras.layers.GRU(100), keras.layers.Dense(32, activation= 'relu')])
GRU.compile(loss="mae", optimizer="adam")
resultsGRU = GRU.fit(train_data, test_data, epochs = 5)
I get the following error:
File ~\Anaconda3\lib\site-packages\keras\utils\traceback_utils.py:70, in filter_traceback.<locals>.error_handler(*args, **kwargs)
67 filtered_tb = _process_traceback_frames(e.__traceback__)
68 # To get the full stack trace, call:
69 # `tf.debugging.disable_traceback_filtering()`
---> 70 raise e.with_traceback(filtered_tb) from None
71 finally:
72 del filtered_tb
File ~\Anaconda3\lib\site-packages\keras\engine\data_adapter.py:997, in KerasSequenceAdapter.__init__(self, x, y, sample_weights, shuffle, workers, use_multiprocessing, max_queue_size, model, **kwargs)
984 def __init__(
985 self,
986 x,
(...)
994 **kwargs
995 ):
996 if not is_none_or_empty(y):
--> 997 raise ValueError(
998 "`y` argument is not supported when using "
999 "`keras.utils.Sequence` as input."
1000 )
1001 if not is_none_or_empty(sample_weights):
1002 raise ValueError(
1003 "`sample_weight` argument is not supported when using "
1004 "`keras.utils.Sequence` as input."
1005 )
ValueError: `y` argument is not supported when using `keras.utils.Sequence` as input.
I tried
x, y = train_data[0]
print(x.shape, y.shape)
to convert it to float before I use GRU.fit(), but I get this error
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File ~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py:3621, in Index.get_loc(self, key, method, tolerance)
3620 try:
-> 3621 return self._engine.get_loc(casted_key)
3622 except KeyError as err:
File ~\Anaconda3\lib\site-packages\pandas\_libs\index.pyx:136, in pandas._libs.index.IndexEngine.get_loc()
File ~\Anaconda3\lib\site-packages\pandas\_libs\index.pyx:163, in pandas._libs.index.IndexEngine.get_loc()
File pandas\_libs\hashtable_class_helper.pxi:2131, in pandas._libs.hashtable.Int64HashTable.get_item()
File pandas\_libs\hashtable_class_helper.pxi:2140, in pandas._libs.hashtable.Int64HashTable.get_item()
KeyError: 4331
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
Input In [205], in <cell line: 1>()
----> 1 x, y = train_data[0]
2 print(x.shape, y.shape)
File ~\Anaconda3\lib\site-packages\keras\preprocessing\sequence.py:189, in TimeseriesGenerator.__getitem__(self, index)
177 rows = np.arange(
178 i,
179 min(i + self.batch_size * self.stride, self.end_index + 1),
180 self.stride,
181 )
183 samples = np.array(
184 [
185 self.data[row - self.length : row : self.sampling_rate]
186 for row in rows
187 ]
188 )
--> 189 targets = np.array([self.targets[row] for row in rows])
191 if self.reverse:
192 return samples[:, ::-1, ...], targets
File ~\Anaconda3\lib\site-packages\keras\preprocessing\sequence.py:189, in <listcomp>(.0)
177 rows = np.arange(
178 i,
179 min(i + self.batch_size * self.stride, self.end_index + 1),
180 self.stride,
181 )
183 samples = np.array(
184 [
185 self.data[row - self.length : row : self.sampling_rate]
186 for row in rows
187 ]
188 )
--> 189 targets = np.array([self.targets[row] for row in rows])
191 if self.reverse:
192 return samples[:, ::-1, ...], targets
File ~\Anaconda3\lib\site-packages\pandas\core\series.py:958, in Series.__getitem__(self, key)
955 return self._values[key]
957 elif key_is_scalar:
--> 958 return self._get_value(key)
960 if is_hashable(key):
961 # Otherwise index.get_value will raise InvalidIndexError
962 try:
963 # For labels that don't resolve as scalars like tuples and frozensets
File ~\Anaconda3\lib\site-packages\pandas\core\series.py:1069, in Series._get_value(self, label, takeable)
1066 return self._values[label]
1068 # Similar to Index.get_value, but we do not fall back to positional
-> 1069 loc = self.index.get_loc(label)
1070 return self.index._get_values_for_loc(self, loc, label)
File ~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py:3623, in Index.get_loc(self, key, method, tolerance)
3621 return self._engine.get_loc(casted_key)
3622 except KeyError as err:
-> 3623 raise KeyError(key) from err
3624 except TypeError:
3625 # If we have a listlike key, _check_indexing_error will raise
3626 # InvalidIndexError. Otherwise we fall through and re-raise
3627 # the TypeError.
3628 self._check_indexing_error(key)
KeyError: 4331
Can anyone please explain what is wrong?
My whole code worked fine before, I re-ran it to check if everything really works and now I suddenly have this problem and I don't know how to fix it.
Related
Getting error while predicting for the test data:
CODE:
from pycaret.anomaly import *
anom_exp = setup(train,session_id = 125,
categorical_features=['date', 'hours', 'weekNumber', 'DayName', 'isWeekday'],
numeric_features=['cpu_avg'],
ignore_features = ['Timestamp', 'time'])
sod = create_model('sod',fraction = 0.1)
sod_test= predict_model(model = sod, data = test)
ERROR:
KeyError Traceback (most recent call last)
File \~/.local/lib/python3.10/site-packages/pandas/core/indexes/base.py:3629, in Index.get_loc(self, key, method, tolerance)
3628 try:
\-\> 3629 return self.\_engine.get_loc(casted_key)
3630 except KeyError as err:
File \~/.local/lib/python3.10/site-packages/pandas/\_libs/index.pyx:136, in pandas.\_libs.index.IndexEngine.get_loc()
File \~/.local/lib/python3.10/site-packages/pandas/\_libs/index.pyx:163, in pandas.\_libs.index.IndexEngine.get_loc()
File pandas/\_libs/hashtable_class_helper.pxi:5198, in pandas.\_libs.hashtable.PyObjectHashTable.get_item()
File pandas/\_libs/hashtable_class_helper.pxi:5206, in pandas.\_libs.hashtable.PyObjectHashTable.get_item()
KeyError: 0
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
Cell In\[20\], line 1
\----\> 1 sod_test= predict_model(model = sod, data = test)
File \~/.local/lib/python3.10/site-packages/pycaret/anomaly/functional.py:941, in predict_model(model, data)
938 if experiment is None:
939 experiment = \_EXPERIMENT_CLASS()
\--\> 941 return experiment.predict_model(estimator=model, data=data)
File \~/.local/lib/python3.10/site-packages/pycaret/anomaly/oop.py:87, in AnomalyExperiment.predict_model(self, estimator, data, ml_usecase)
48 def predict_model(
49 self, estimator, data: pd.DataFrame, ml_usecase: Optional\[MLUsecase\] = None
50 ) -\> pd.DataFrame:
51 """
52 This function generates anomaly labels on using a trained model.
53
(...)
85
86 """
\---\> 87 return super().predict_model(estimator, data, ml_usecase)
File \~/.local/lib/python3.10/site-packages/pycaret/internal/pycaret_experiment/unsupervised_experiment.py:1354, in \_UnsupervisedExperiment.predict_model(self, estimator, data, ml_usecase)
1351 else:
1352 raise TypeError("Model doesn't support predict parameter.")
\-\> 1354 pred = estimator.predict(data_transformed)
1355 if ml_usecase == MLUsecase.CLUSTERING:
1356 data_transformed\["Cluster"\] = \[f"Cluster {i}" for i in pred\]
File \~/.local/lib/python3.10/site-packages/pyod/models/base.py:165, in BaseDetector.predict(self, X, return_confidence)
144 """Predict if a particular sample is an outlier or not.
145
146 Parameters
(...)
161 Only if return_confidence is set to True.
162 """
164 check_is_fitted(self, \['decision_scores\_', 'threshold\_', 'labels\_'\])
\--\> 165 pred_score = self.decision_function(X)
166 prediction = (pred_score \> self.threshold\_).astype('int').ravel()
168 if return_confidence:
File \~/.local/lib/python3.10/site-packages/pyod/models/sod.py:157, in SOD.decision_function(self, X)
140 def decision_function(self, X):
141 """Predict raw anomaly score of X using the fitted detector.
142 The anomaly score of an input sample is computed based on different
143 detector algorithms. For consistency, outliers are assigned with
(...)
155 The anomaly score of the input samples.
156 """
\--\> 157 return self.\_sod(X)
File \~/.local/lib/python3.10/site-packages/pyod/models/sod.py:187, in SOD.\_sod(self, X)
185 anomaly_scores = np.zeros(shape=(X.shape\[0\],))
186 for i in range(X.shape\[0\]):
\--\> 187 obs = X\[i\]
188 ref = X\[ref_inds\[i,\],\]
189 means = np.mean(ref, axis=0) # mean of each column
File \~/.local/lib/python3.10/site-packages/pandas/core/frame.py:3505, in DataFrame.__getitem__(self, key)
3503 if self.columns.nlevels \> 1:
3504 return self.\_getitem_multilevel(key)
\-\> 3505 indexer = self.columns.get_loc(key)
3506 if is_integer(indexer):
3507 indexer = \[indexer\]
File \~/.local/lib/python3.10/site-packages/pandas/core/indexes/base.py:3631, in Index.get_loc(self, key, method, tolerance)
3629 return self.\_engine.get_loc(casted_key)
3630 except KeyError as err:
\-\> 3631 raise KeyError(key) from err
3632 except TypeError:
3633 # If we have a listlike key, \_check_indexing_error will raise
3634 # InvalidIndexError. Otherwise we fall through and re-raise
3635 # the TypeError.
3636 self.\_check_indexing_error(key)
KeyError: 0
I looked at the source code and I think I know where the problem is, but editing that did not predict anomalies correctly.
source code: https://pyod.readthedocs.io/en/latest/_modules/pyod/models/sod.html
There is a decision_function which is defined as:
def decision_function(self, X):
return self._sod(X)
What I think is the problem: Dataframe X should be changed to array type using check_array(X) before sending it to _sod function
I was trying to solve this problem
ValueError: `logits` and `labels` must have the same shape, received ((None, 1) vs (None, 2)). when trying to run this code:
tuner.search(x_train, y_train, batch_size=50,
validation_data = (x_test, y_test),
epochs=100, callbacks=[stop_early])
And I found this here on Stack Overflow:
tuner.search(x_train, [y_train[:, 0], y_train[:, 1]], batch_size=50,
validation_data = (x_test, [y_test[:, 0], y_test[:, 1]] ),
epochs=100, callbacks=[stop_early])
now, I'm getting this error:
AttributeError Traceback (most recent call last)
<ipython-input-28-73685d0b2fe7> in <module>
1 tuner.search(x_train, [y_train[:, 0], y_train[:, 1]], batch_size=50,
2 validation_data = (x_test, [y_test[:, 0], y_test[:, 1]] ),
----> 3 epochs=100, callbacks=[stop_early])
6 frames
/usr/local/lib/python3.7/dist-packages/keras_tuner/engine/base_tuner.py in search(self, *fit_args, **fit_kwargs)
181
182 self.on_trial_begin(trial)
--> 183 results = self.run_trial(trial, *fit_args, **fit_kwargs)
184 # `results` is None indicates user updated oracle in `run_trial()`.
185 if results is None:
/usr/local/lib/python3.7/dist-packages/keras_tuner/tuners/hyperband.py in run_trial(self, trial, *fit_args, **fit_kwargs)
382 fit_kwargs["epochs"] = hp.values["tuner/epochs"]
383 fit_kwargs["initial_epoch"] = hp.values["tuner/initial_epoch"]
--> 384 return super(Hyperband, self).run_trial(trial, *fit_args, **fit_kwargs)
385
386 def _build_model(self, hp):
/usr/local/lib/python3.7/dist-packages/keras_tuner/engine/tuner.py in run_trial(self, trial, *args, **kwargs)
293 callbacks.append(model_checkpoint)
294 copied_kwargs["callbacks"] = callbacks
--> 295 obj_value = self._build_and_fit_model(trial, *args, **copied_kwargs)
296
297 histories.append(obj_value)
/usr/local/lib/python3.7/dist-packages/keras_tuner/engine/tuner.py in _build_and_fit_model(self, trial, *args, **kwargs)
220 hp = trial.hyperparameters
221 model = self._try_build(hp)
--> 222 results = self.hypermodel.fit(hp, model, *args, **kwargs)
223 tuner_utils.validate_trial_results(
224 results, self.oracle.objective, "HyperModel.fit()"
/usr/local/lib/python3.7/dist-packages/keras_tuner/engine/hypermodel.py in fit(self, hp, model, *args, **kwargs)
138 If return a float, it should be the `objective` value.
139 """
--> 140 return model.fit(*args, **kwargs)
141
142
/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py in error_handler(*args, **kwargs)
65 except Exception as e: # pylint: disable=broad-except
66 filtered_tb = _process_traceback_frames(e.__traceback__)
---> 67 raise e.with_traceback(filtered_tb) from None
68 finally:
69 del filtered_tb
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/func_graph.py in autograph_handler(*args, **kwargs)
1127 except Exception as e: # pylint:disable=broad-except
1128 if hasattr(e, "ag_error_metadata"):
-> 1129 raise e.ag_error_metadata.to_exception(e)
1130 else:
1131 raise
AttributeError: 'tuple' object has no attribute 'shape'
It seems like epochs = 100 and callbacks=[stop_early] are the problem, but before adding the modifications to solve the first problem it wasn't happening. How could i solve this?
I have been working on a kaggle competition named as Titanic-spaceship. I had been trying to get my hands dirty to use sophisticated techniques to impute missing values. I had been trying to impute missing value of a categorical column named Cabin using datawig SimpleImputer but it is giving me an error. My dataset is as follows:
dataset.tail(555)
I have used following code for splitting training and testing dataset in order to able to train on datawig SimpleImputer:
import datawig
dataset_Cabin_train = dataset[dataset["Cabin"].isnull()==False]
dataset_Cabin_test = dataset[dataset["Cabin"].isnull()==True]
Cabins_train_columsn = dataset_Cabin_train.columns.drop("Cabin")
dataset_Cabin_train["Cabin"]
imputer = datawig.SimpleImputer(input_columns=Cabins_train_columsn,output_column="Cabin")
imputer.fit(dataset_Cabin_train)
However it is giving me the following error on fitting the following dataset:
TypeError Traceback (most recent call last)
~\Anaconda3\lib\site-packages\pandas\core\ops\array_ops.py in _na_arithmetic_op(left, right, op, is_cmp)
165 try:
--> 166 result = func(left, right)
167 except TypeError:
~\Anaconda3\lib\site-packages\pandas\core\roperator.py in radd(left, right)
8 def radd(left, right):
----> 9 return right + left
10
TypeError: can only concatenate str (not "bool") to str
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
<ipython-input-42-743c6e627a27> in <module>
1 imputer = datawig.SimpleImputer(input_columns=Cabins_train_columsn,output_column="Cabin")
----> 2 imputer.fit(dataset_Cabin_train)
~\Anaconda3\lib\site-packages\datawig\simple_imputer.py in fit(self, train_df, test_df, ctx, learning_rate, num_epochs, patience, test_split, weight_decay, batch_size, final_fc_hidden_units, calibrate, class_weights, instance_weights)
388 weight_decay, batch_size,
389 final_fc_hidden_units=final_fc_hidden_units,
--> 390 calibrate=calibrate)
391 self.save()
392
~\Anaconda3\lib\site-packages\datawig\imputer.py in fit(self, train_df, test_df, ctx, learning_rate, num_epochs, patience, test_split, weight_decay, batch_size, final_fc_hidden_units, calibrate)
261 train_df, test_df = random_split(train_df, [1.0 - test_split, test_split])
262
--> 263 iter_train, iter_test = self.__build_iterators(train_df, test_df, test_split)
264
265 self.__check_data(test_df)
~\Anaconda3\lib\site-packages\datawig\imputer.py in __build_iterators(self, train_df, test_df, test_split)
594 data_columns=self.data_encoders,
595 label_columns=self.label_encoders,
--> 596 batch_size=self.batch_size
597 )
598
~\Anaconda3\lib\site-packages\datawig\iterators.py in __init__(self, data_frame, data_columns, label_columns, batch_size)
238 column_encoder.fit(data_frame)
239
--> 240 self.df_iterator = self.mxnet_iterator_from_df(data_frame)
241 self.df_iterator.reset()
242 self._provide_data = self.df_iterator.provide_data
~\Anaconda3\lib\site-packages\datawig\iterators.py in mxnet_iterator_from_df(self, data_frame)
106 data = {}
107 for col_enc in self.data_columns:
--> 108 data_array_numpy = col_enc.transform(data_frame)
109 data[col_enc.output_column] = mx.nd.array(data_array_numpy[:n_samples, :])
110 logger.debug("Data Encoding - Encoded {} rows of column \
~\Anaconda3\lib\site-packages\datawig\column_encoders.py in transform(self, data_frame)
610 for col in self.input_columns:
611 if self.prefixed_concatenation:
--> 612 tmp_col += col + " " + data_frame[col].fillna("") + " "
613 else:
614 tmp_col += data_frame[col].fillna("") + " "
~\Anaconda3\lib\site-packages\pandas\core\ops\common.py in new_method(self, other)
67 other = item_from_zerodim(other)
68
---> 69 return method(self, other)
70
71 return new_method
~\Anaconda3\lib\site-packages\pandas\core\arraylike.py in __radd__(self, other)
94 #unpack_zerodim_and_defer("__radd__")
95 def __radd__(self, other):
---> 96 return self._arith_method(other, roperator.radd)
97
98 #unpack_zerodim_and_defer("__sub__")
~\Anaconda3\lib\site-packages\pandas\core\series.py in _arith_method(self, other, op)
5524
5525 with np.errstate(all="ignore"):
-> 5526 result = ops.arithmetic_op(lvalues, rvalues, op)
5527
5528 return self._construct_result(result, name=res_name)
~\Anaconda3\lib\site-packages\pandas\core\ops\array_ops.py in arithmetic_op(left, right, op)
222 _bool_arith_check(op, left, right)
223
--> 224 res_values = _na_arithmetic_op(left, right, op)
225
226 return res_values
~\Anaconda3\lib\site-packages\pandas\core\ops\array_ops.py in _na_arithmetic_op(left, right, op, is_cmp)
171 # Don't do this for comparisons, as that will handle complex numbers
172 # incorrectly, see GH#32047
--> 173 result = _masked_arith_op(left, right, op)
174 else:
175 raise
~\Anaconda3\lib\site-packages\pandas\core\ops\array_ops.py in _masked_arith_op(x, y, op)
129
130 if mask.any():
--> 131 result[mask] = op(xrav[mask], y)
132
133 np.putmask(result, ~mask, np.nan)
~\Anaconda3\lib\site-packages\pandas\core\roperator.py in radd(left, right)
7
8 def radd(left, right):
----> 9 return right + left
10
11
TypeError: can only concatenate str (not "bool") to str
Can anyone tell me what I am doing wrong. Thanks!!!!
I am noob in python and tensorflow. And I met a problem when training tensorflow lite model in colab.
The model was good before exporting model.
However, when I test the tflite file on my images which using following code, I got an error.
model.evaluate_tflite('/content/label-img/model.tflite', validation_data)
error
---------------------------------------------------------------------------
InvalidArgumentError Traceback (most recent call last)
<ipython-input-22-6548d36b036c> in <module>()
----> 1 model.evaluate_tflite('/content/label-img/model.tflite', validation_data)
8 frames
/usr/local/lib/python3.7/dist-packages/tensorflow_examples/lite/model_maker/core/task/object_detector.py in evaluate_tflite(self, tflite_filepath, data)
187 ds = data.gen_dataset(self.model_spec, batch_size=1, is_training=False)
188 return self.model_spec.evaluate_tflite(tflite_filepath, ds, len(data),
--> 189 data.annotations_json_file)
190
191 def _export_saved_model(self, saved_model_dir: str) -> None:
/usr/local/lib/python3.7/dist-packages/tensorflow_examples/lite/model_maker/core/task/model_spec/object_detector_spec.py in evaluate_tflite(self, tflite_filepath, dataset, steps, json_file)
386 normalize_factor = tf.constant([height, width, height, width],
387 dtype=tf.float32)
--> 388 nms_boxes *= normalize_factor
389 if labels['image_scales'] is not None:
390 scales = tf.expand_dims(tf.expand_dims(labels['image_scales'], -1), -1)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/math_ops.py in r_binary_op_wrapper(y, x)
1398 # r_binary_op_wrapper use different force_same_dtype values.
1399 y, x = maybe_promote_tensors(y, x)
-> 1400 return func(x, y, name=name)
1401
1402 # Propagate func.__doc__ to the wrappers
/usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/math_ops.py in _mul_dispatch(x, y, name)
1708 return sparse_tensor.SparseTensor(y.indices, new_vals, y.dense_shape)
1709 else:
-> 1710 return multiply(x, y, name=name)
1711
1712
/usr/local/lib/python3.7/dist-packages/tensorflow/python/util/dispatch.py in wrapper(*args, **kwargs)
204 """Call target, and fall back on dispatchers if there is a TypeError."""
205 try:
--> 206 return target(*args, **kwargs)
207 except (TypeError, ValueError):
208 # Note: convert_to_eager_tensor currently raises a ValueError, not a
/usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/math_ops.py in multiply(x, y, name)
528 """
529
--> 530 return gen_math_ops.mul(x, y, name)
531
532
/usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/gen_math_ops.py in mul(x, y, name)
6234 return _result
6235 except _core._NotOkStatusException as e:
-> 6236 _ops.raise_from_not_ok_status(e, name)
6237 except _core._FallbackException:
6238 pass
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ops.py in raise_from_not_ok_status(e, name)
6939 message = e.message + (" name: " + name if name is not None else "")
6940 # pylint: disable=protected-access
-> 6941 six.raise_from(core._status_to_exception(e.code, message), None)
6942 # pylint: enable=protected-access
6943
/usr/local/lib/python3.7/dist-packages/six.py in raise_from(value, from_value)
InvalidArgumentError: required broadcastable shapes [Op:Mul]
This error should be due to the shapes difference. But can anyone tell me how to do broadcastable in colab?
The code and colab are from https://colab.research.google.com/github/googlecodelabs/odml-pathways/blob/main/object-detection/codelab2/python/Train_a_salad_detector_with_TFLite_Model_Maker.ipynb#scrollTo=HD5BvzWe6YKa. The tensorflow official tutorial.
Currently, the above colab has an issue with TensorFlow 2.6 or above. Please stick with TensorFlow 2.5 for the time being.
I am getting KeyError:0 when running this code in python:
full_pipeline.fit(X_train, y_train)
Here is the completed code:
from gensim.sklearn_api import D2VTransformer
from sklearn.pipeline import FeatureUnion, Pipeline
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
name_pipeline = Pipeline( steps = [
( 'feature_selector', FeatureSelector(['name']) ),
( 'feature_transformer', D2VTransformer() ) ] )
description_pipeline = Pipeline( steps = [
( 'feature_selector', FeatureSelector(['description']) ),
( 'feature_transformer', D2VTransformer() ) ] )
X_pipeline = FeatureUnion( transformer_list = [
( 'name_pipeline', name_pipeline ),
( 'description_pipeline', description_pipeline ) ] )
#Split up the train and test data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1)
clf = LogisticRegression(random_state=0, class_weight='balanced', solver='lbfgs', max_iter=1000, multi_class='multinomial')
full_pipeline = Pipeline( steps =
[ ( 'pipeline', X_pipeline),
( 'model', clf ) ] )
full_pipeline.fit(X_train, y_train)
And here is the error I'm getting:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2896 try:
-> 2897 return self._engine.get_loc(key)
2898 except KeyError:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 0
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
19 frames
<ipython-input-14-0ddbaedffb67> in <module>()
25 ( 'model', clf ) ] )
26
---> 27 full_pipeline.fit(X_train, y_train)
/usr/local/lib/python3.6/dist-packages/sklearn/pipeline.py in fit(self, X, y, **fit_params)
350 This estimator
351 """
--> 352 Xt, fit_params = self._fit(X, y, **fit_params)
353 with _print_elapsed_time('Pipeline',
354 self._log_message(len(self.steps) - 1)):
/usr/local/lib/python3.6/dist-packages/sklearn/pipeline.py in _fit(self, X, y, **fit_params)
315 message_clsname='Pipeline',
316 message=self._log_message(step_idx),
--> 317 **fit_params_steps[name])
318 # Replace the transformer of the step with the fitted
319 # transformer. This is necessary when loading the transformer
/usr/local/lib/python3.6/dist-packages/joblib/memory.py in __call__(self, *args, **kwargs)
353
354 def __call__(self, *args, **kwargs):
--> 355 return self.func(*args, **kwargs)
356
357 def call_and_shelve(self, *args, **kwargs):
/usr/local/lib/python3.6/dist-packages/sklearn/pipeline.py in _fit_transform_one(transformer, X, y, weight, message_clsname, message, **fit_params)
714 with _print_elapsed_time(message_clsname, message):
715 if hasattr(transformer, 'fit_transform'):
--> 716 res = transformer.fit_transform(X, y, **fit_params)
717 else:
718 res = transformer.fit(X, y, **fit_params).transform(X)
/usr/local/lib/python3.6/dist-packages/sklearn/pipeline.py in fit_transform(self, X, y, **fit_params)
910 sum of n_components (output dimension) over transformers.
911 """
--> 912 results = self._parallel_func(X, y, fit_params, _fit_transform_one)
913 if not results:
914 # All transformers are None
/usr/local/lib/python3.6/dist-packages/sklearn/pipeline.py in _parallel_func(self, X, y, fit_params, func)
940 message=self._log_message(name, idx, len(transformers)),
941 **fit_params) for idx, (name, transformer,
--> 942 weight) in enumerate(transformers, 1))
943
944 def transform(self, X):
/usr/local/lib/python3.6/dist-packages/joblib/parallel.py in __call__(self, iterable)
1001 # remaining jobs.
1002 self._iterating = False
-> 1003 if self.dispatch_one_batch(iterator):
1004 self._iterating = self._original_iterator is not None
1005
/usr/local/lib/python3.6/dist-packages/joblib/parallel.py in dispatch_one_batch(self, iterator)
832 return False
833 else:
--> 834 self._dispatch(tasks)
835 return True
836
/usr/local/lib/python3.6/dist-packages/joblib/parallel.py in _dispatch(self, batch)
751 with self._lock:
752 job_idx = len(self._jobs)
--> 753 job = self._backend.apply_async(batch, callback=cb)
754 # A job can complete so quickly than its callback is
755 # called before we get here, causing self._jobs to
/usr/local/lib/python3.6/dist-packages/joblib/_parallel_backends.py in apply_async(self, func, callback)
199 def apply_async(self, func, callback=None):
200 """Schedule a func to be run"""
--> 201 result = ImmediateResult(func)
202 if callback:
203 callback(result)
/usr/local/lib/python3.6/dist-packages/joblib/_parallel_backends.py in __init__(self, batch)
580 # Don't delay the application, to avoid keeping the input
581 # arguments in memory
--> 582 self.results = batch()
583
584 def get(self):
/usr/local/lib/python3.6/dist-packages/joblib/parallel.py in __call__(self)
254 with parallel_backend(self._backend, n_jobs=self._n_jobs):
255 return [func(*args, **kwargs)
--> 256 for func, args, kwargs in self.items]
257
258 def __len__(self):
/usr/local/lib/python3.6/dist-packages/joblib/parallel.py in <listcomp>(.0)
254 with parallel_backend(self._backend, n_jobs=self._n_jobs):
255 return [func(*args, **kwargs)
--> 256 for func, args, kwargs in self.items]
257
258 def __len__(self):
/usr/local/lib/python3.6/dist-packages/sklearn/pipeline.py in _fit_transform_one(transformer, X, y, weight, message_clsname, message, **fit_params)
714 with _print_elapsed_time(message_clsname, message):
715 if hasattr(transformer, 'fit_transform'):
--> 716 res = transformer.fit_transform(X, y, **fit_params)
717 else:
718 res = transformer.fit(X, y, **fit_params).transform(X)
/usr/local/lib/python3.6/dist-packages/sklearn/pipeline.py in fit_transform(self, X, y, **fit_params)
391 return Xt
392 if hasattr(last_step, 'fit_transform'):
--> 393 return last_step.fit_transform(Xt, y, **fit_params)
394 else:
395 return last_step.fit(Xt, y, **fit_params).transform(Xt)
/usr/local/lib/python3.6/dist-packages/sklearn/base.py in fit_transform(self, X, y, **fit_params)
554 else:
555 # fit method of arity 2 (supervised transformation)
--> 556 return self.fit(X, y, **fit_params).transform(X)
557
558
/usr/local/lib/python3.6/dist-packages/gensim/sklearn_api/d2vmodel.py in fit(self, X, y)
158
159 """
--> 160 if isinstance(X[0], doc2vec.TaggedDocument):
161 d2v_sentences = X
162 else:
/usr/local/lib/python3.6/dist-packages/pandas/core/frame.py in __getitem__(self, key)
2993 if self.columns.nlevels > 1:
2994 return self._getitem_multilevel(key)
-> 2995 indexer = self.columns.get_loc(key)
2996 if is_integer(indexer):
2997 indexer = [indexer]
/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2897 return self._engine.get_loc(key)
2898 except KeyError:
-> 2899 return self._engine.get_loc(self._maybe_cast_indexer(key))
2900 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
2901 if indexer.ndim > 1 or indexer.size > 1:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 0
Does anyone know why might this happen? I think it has to do with D2VTransformer because when I'm running the code below I'm getting the same error:
model = D2VTransformer(min_count=1, size=5)
docvecs = model.fit_transform(X_train)
But when trying to select only one column from the dataframe:
docvecs = model.fit_transform(X_train['name'])
it doesn't throw an error and that is why when I created the pipelines I've only used one column, but still getting the error.
This is how X_train looks.
name description
9107 way great entrepreneur push limit help succeed way great entrepreneur push limit
7706 dit het team week week dit het team week week
3995 decorate home jewel tone feel bold colour choice inspire fill home abun...
5220 attic meat district attic meat district
3412 tee apparel choose design item clothe accessory piece inde...
... ... ...
3830 marque web designer mode marque web designer
3261 design holiday rest bite try lear magazine dai... design holiday rest bite try lear
2415 hallucinatory house father spirit music room hold tower season rug produce early...
7223 jacket rise jacket rise
4697 cupcake bake explorer love love chocolate cupcake top kind easy foll...
And some more details about X_train:
X_train.shape
(7159, 2)
X_train.dtypes
name object
description object
dtype: object
It looks like there was a recent bug and fix in gensim (October 2019, and not yet in any official release) to make D2VTransformer more tolerant of some Pandas Series as data sources, to resolve exactly the same exception as you've hit.
The line of code changed is exactly the one shown in your extended error-stack - line 160 of d2vmodel.py, testing X[0].
I would suggest grabbing the raw source of the latest version of d2vmodel.py to use locally (instead of importing from gensim.sklearn_api), and check if that resolves your issue. See:
https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/sklearn_api/d2vmodel.py