ValueError: Classification metrics unable to handle multiclass - python

I am trying to build a object classification model, but when trying to print out the classification report it returned a value error.
ValueError: Classification metrics can't handle a mix of multiclass and continuous-multioutput targets
This is my current code:
train_size = int(len(df) * 0.7,)
train_text = df['cleansed_text'][:train_size]
train_cat = df['category'][:train_size]
test_text = df['cleansed_text'][train_size:]
test_cat = df['category'][train_size:]
max_words = 2500
tokenize = text.Tokenizer(num_words=max_words, char_level=False)
tokenize.fit_on_texts(train_text)
x_train = tokenize.texts_to_matrix(train_text)
x_test = tokenize.texts_to_matrix(test_text)
encoder = LabelEncoder()
encoder.fit(train_cat)
y_train = encoder.transform(train_cat)
y_test = encoder.transform(test_cat)
num_classes = np.max(y_train) + 1
y_train = utils.to_categorical(y_train, num_classes)
y_test = utils.to_categorical(y_test, num_classes)
model = Sequential()
model.add(Dense(256, input_shape=(max_words,)))
model.add(Dropout(0.5))
model.add(Dense(256,))
model.add(Dropout(0.5))
model.add(Activation('relu'))
model.add(Dense(num_classes, activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
model.summary()
history = model.fit(x_train, y_train,
batch_size=32,
epochs=10,
verbose=1,
validation_split=0.1)
from sklearn.metrics import classification_report
y_test_arg=np.argmax(y_test,axis=1)
Y_pred = np.argmax(model.predict(x_test),axis=1)
print('Confusion Matrix')
print(confusion_matrix(y_test_arg, Y_pred))
print(classification_report(y_test_arg, y_pred, labels=[1,2,3,4,5]))
However, when I attempt to print out the classification report, it ran into this error:
21/21 [==============================] - 0s 2ms/step
Confusion Matrix
[[138 1 6 0 2]
[ 0 102 3 0 2]
[ 3 2 121 1 2]
[ 1 0 1 157 0]
[ 0 3 0 0 123]]
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [56], in <cell line: 8>()
5 print('Confusion Matrix')
6 print(confusion_matrix(y_test_arg, Y_pred))
----> 8 print(classification_report(y_test_arg, y_pred, labels=[1,2,3,4,5]))
File ~\anaconda3\lib\site-packages\sklearn\metrics\_classification.py:2110, in classification_report(y_true, y_pred, labels, target_names, sample_weight, digits, output_dict, zero_division)
1998 def classification_report(
1999 y_true,
2000 y_pred,
(...)
2007 zero_division="warn",
2008 ):
2009 """Build a text report showing the main classification metrics.
2010
2011 Read more in the :ref:`User Guide <classification_report>`.
(...)
2107 <BLANKLINE>
2108 """
-> 2110 y_type, y_true, y_pred = _check_targets(y_true, y_pred)
2112 if labels is None:
2113 labels = unique_labels(y_true, y_pred)
File ~\anaconda3\lib\site-packages\sklearn\metrics\_classification.py:93, in _check_targets(y_true, y_pred)
90 y_type = {"multiclass"}
92 if len(y_type) > 1:
---> 93 raise ValueError(
94 "Classification metrics can't handle a mix of {0} and {1} targets".format(
95 type_true, type_pred
96 )
97 )
99 # We can't have more than one value on y_type => The set is no more needed
100 y_type = y_type.pop()
ValueError: Classification metrics can't handle a mix of multiclass and continuous-multioutput targets
y_test_arg
array([3, 3, 1, 0, 4, 1, 0, 4, 3, 4, 1, 1, 2, 2, 3, 0, 0, 4, 1, 3, 2, 0,
4, 1, 2, 3, 1, 2, 2, 4, 3, 2, 0, 2, 1, 4, 3, 2, 1, 1, 0, 3, 4, 4,
3, 1, 4, 2, 4, 3, 2, 2, 3, 1, 3, 2, 3, 4, 1, 3, 1, 0, 0, 1, 1, 1,
4, 3, 0, 0, 2, 2, 0, 2, 1, 3, 3, 4, 2, 3, 0, 3, 0, 4, 3, 3, 0, 1,
3, 3, 4, 3, 0, 2, 0, 1, 4, 1, 2, 0, 1, 2, 1, 2, 2, 0, 3, 3, 3, 4,
4, 3, 2, 1, 4, 3, 1, 0, 1, 2, 0, 3, 4, 0, 3, 2, 0, 1, 1, 1, 2, 1,
2, 1, 3, 1, 3, 2, 2, 0, 2, 4, 3, 4, 3, 0, 2, 4, 1, 1, 2, 1, 2, 3,
3, 2, 0, 4, 3, 2, 2, 1, 3, 2, 2, 0, 4, 4, 0, 4, 3, 3, 0, 2, 0, 4,
3, 4, 2, 1, 3, 0, 3, 1, 4, 4, 3, 2, 3, 0, 3, 0, 3, 3, 1, 1, 0, 4,
4, 0, 4, 0, 0, 3, 3, 2, 3, 4, 3, 4, 3, 3, 0, 0, 4, 3, 0, 4, 4, 2,
3, 0, 1, 1, 4, 2, 3, 3, 4, 0, 4, 1, 1, 2, 2, 0, 1, 3, 1, 1, 0, 3,
2, 4, 0, 3, 1, 4, 2, 2, 3, 3, 0, 0, 0, 0, 0, 1, 0, 2, 2, 4, 4, 1,
2, 1, 0, 2, 3, 3, 0, 4, 0, 4, 3, 0, 0, 2, 3, 3, 2, 2, 1, 1, 2, 0,
2, 2, 0, 4, 2, 2, 2, 2, 2, 1, 1, 4, 2, 3, 2, 3, 4, 3, 3, 3, 1, 4,
1, 4, 3, 4, 3, 3, 1, 1, 0, 1, 1, 2, 0, 3, 4, 4, 2, 0, 3, 0, 1, 3,
2, 1, 3, 3, 0, 2, 4, 4, 0, 0, 3, 2, 1, 3, 3, 2, 1, 4, 3, 1, 0, 2,
3, 2, 4, 1, 3, 2, 0, 1, 2, 1, 2, 3, 2, 0, 0, 2, 0, 4, 3, 0, 1, 0,
3, 3, 1, 4, 2, 4, 2, 2, 3, 3, 3, 0, 4, 1, 0, 3, 0, 3, 0, 4, 0, 0,
0, 0, 3, 3, 3, 0, 0, 1, 0, 0, 0, 3, 3, 3, 4, 0, 3, 3, 3, 0, 1, 4,
4, 4, 2, 0, 0, 4, 0, 4, 3, 3, 2, 2, 2, 3, 3, 2, 2, 4, 0, 3, 3, 3,
3, 0, 3, 0, 0, 0, 0, 3, 2, 3, 4, 4, 3, 4, 0, 1, 0, 3, 0, 4, 4, 2,
1, 0, 1, 0, 4, 2, 1, 2, 1, 1, 4, 0, 4, 4, 0, 2, 3, 1, 0, 2, 1, 0,
4, 3, 4, 2, 3, 2, 0, 2, 2, 0, 0, 0, 4, 2, 0, 2, 0, 1, 2, 3, 2, 2,
3, 1, 4, 4, 0, 4, 3, 0, 0, 2, 3, 4, 4, 4, 3, 1, 3, 2, 0, 2, 2, 1,
4, 0, 4, 3, 1, 1, 3, 0, 1, 4, 4, 3, 1, 0, 2, 2, 2, 4, 4, 0, 2, 0,
2, 2, 1, 3, 4, 0, 4, 1, 4, 4, 3, 2, 3, 3, 2, 1, 1, 0, 2, 2, 3, 0,
0, 4, 0, 4, 4, 3, 0, 2, 3, 0, 0, 3, 4, 3, 4, 1, 3, 3, 1, 0, 4, 3,
3, 2, 4, 0, 2, 3, 3, 2, 1, 4, 4, 4, 0, 3, 1, 1, 4, 0, 2, 4, 3, 3,
4, 4, 2, 0, 3, 1, 1, 3, 1, 4, 4, 0, 0, 0, 3, 3, 4, 3, 0, 4, 0, 0,
3, 0, 2, 0, 0, 4, 0, 4, 2, 4, 1, 2, 4, 1, 3, 2, 1, 0, 4, 0, 4, 1,
4, 3, 0, 0, 2, 1, 2, 3], dtype=int64)
y_pred
array([[2.6148611e-05, 1.2884392e-06, 8.0136197e-06, 9.9993646e-01,
2.8027451e-05],
[1.1888630e-08, 1.9621881e-07, 6.0117927e-08, 9.9999917e-01,
4.2087538e-07],
[2.4368815e-06, 9.9999702e-01, 2.0465748e-07, 9.2730332e-08,
2.5044619e-07],
...,
[8.7212893e-04, 9.9891293e-01, 7.5106349e-05, 7.0842376e-05,
6.8954141e-05],
[1.2511186e-02, 5.9731454e-05, 9.8512655e-01, 3.0246837e-04,
2.0000227e-03],
[5.9550672e-07, 7.1766672e-06, 2.0012515e-06, 9.9999011e-01,
1.1376539e-07]], dtype=float32)

Your problem is caused by the presence of continuous-multioutput target values in y_test_arg or Y_pred. I think this error was generated in the below code:
y_test_arg=np.argmax(y_test,axis=1)
Y_pred = np.argmax(model.predict(x_test),axis=1)
It would help if you rounded your predictions in Y_pred before calculating classification_report.
You can see this question

Related

TypeError: object of type 'NoneType' has no len() when using KerasClassifier

I want to build a logistic regression model using Keras and train with X epochs. I want to obtain the accuracy and loss scores from the model.
My code raised TypeError: object of type 'NoneType' has no len(). However, X_train[cv_train] and y_train[cv_train] are not NoneType.
Code:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.25, random_state=1)
def build_logistic_regression_model():
model = Sequential()
model.add(Dense(units=1,kernel_initializer='glorot_uniform', activation='sigmoid',kernel_regularizer=l2(0.)))
# Performance visualization callback
performance_viz_cbk = PerformanceVisualizationCallback(model=model,validation_data=X_val,dat_dir='c:\performance_charts')
model.compile(optimizer='sgd',
loss='binary_crossentropy',
metrics=['accuracy'])
return model
lrscores = []
train_lrscores = []
for cv_train, cv_val in kfold.split(X_train, y_train):
lr_model_logit = KerasClassifier(build_fn=build_logistic_regression_model, batch_size = 10)
hist = lr_model_logit.fit(X_train[cv_train], y_train[cv_train], epochs=200).history_
losses = hist["mean_absolute_error"]
train_lrscores.append(hist * 100)
lr_score = hist.score(X_val, y_val)
lrscores.append(lr_score * 100)
Traceback:
/opt/conda/lib/python3.7/site-packages/scikeras/wrappers.py:302: UserWarning: ``build_fn`` will be renamed to ``model`` in a future release, at which point use of ``build_fn`` will raise an Error instead.
"``build_fn`` will be renamed to ``model`` in a future release,"
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/tmp/ipykernel_18384/2762271288.py in <module>
3 for cv_train, cv_val in kfold.split(X_train, y_train):
4 lr_model_logit = KerasClassifier(build_fn=build_logistic_regression_model, batch_size = 10)
----> 5 hist = lr_model_logit.fit(X_train[cv_train], y_train[cv_train], epochs=200).history_
6 losses = hist["mean_absolute_error"]
7 train_lrscores.append(hist * 100)
/opt/conda/lib/python3.7/site-packages/scikeras/wrappers.py in fit(self, X, y, sample_weight, **kwargs)
1492 sample_weight = 1 if sample_weight is None else sample_weight
1493 sample_weight *= compute_sample_weight(class_weight=self.class_weight, y=y)
-> 1494 super().fit(X=X, y=y, sample_weight=sample_weight, **kwargs)
1495 return self
1496
/opt/conda/lib/python3.7/site-packages/scikeras/wrappers.py in fit(self, X, y, sample_weight, **kwargs)
765 sample_weight=sample_weight,
766 warm_start=self.warm_start,
--> 767 **kwargs,
768 )
769
/opt/conda/lib/python3.7/site-packages/scikeras/wrappers.py in _fit(self, X, y, sample_weight, warm_start, epochs, initial_epoch, **kwargs)
927 X = self.feature_encoder_.transform(X)
928
--> 929 self._check_model_compatibility(y)
930
931 self._fit_keras_model(
/opt/conda/lib/python3.7/site-packages/scikeras/wrappers.py in _check_model_compatibility(self, y)
549 # we recognize the attribute but do not force it to be
550 # generated
--> 551 if self.n_outputs_expected_ != len(self.model_.outputs):
552 raise ValueError(
553 "Detected a Keras model input of size"
TypeError: object of type 'NoneType' has no len()
X_train[cv_train]
array([[ 3.49907650e-01, 1.01934833e+00, 9.22962131e-01, ...,
4.65851423e-01, 5.85124577e-01, -2.30825406e-01],
[-1.66145691e-01, -1.70198795e-01, 7.40812556e-01, ...,
-1.25252966e-01, 6.11333541e-04, -1.85578709e+00],
[-3.34532309e-01, 1.47744989e+00, -7.94889360e-01, ...,
1.10431254e+00, 5.00866647e-01, 5.75451553e-01],
...,
[-1.21341832e+00, 8.56729999e-01, 1.87070578e-01, ...,
-8.38769062e-01, -7.08780127e-02, -6.54645722e-01],
[ 3.45711192e-01, 8.01029131e-01, 9.37260745e-01, ...,
6.35312010e-01, -1.77277404e-01, -1.05178867e+00],
[ 1.65016194e+00, 1.34960903e+00, 1.17654404e+00, ...,
3.79284887e-01, 4.38081218e-01, -3.55481467e-01]])
y_train
array([1, 3, 2, 2, 3, 2, 3, 3, 1, 2, 1, 1, 3, 2, 1, 1, 2, 3, 2, 1, 1, 1,
1, 0, 1, 2, 3, 1, 1, 0, 0, 1, 1, 3, 1, 1, 2, 0, 1, 1, 2, 1, 0, 3,
3, 0, 1, 1, 2, 2, 1, 1, 1, 1, 1, 1, 2, 3, 3, 3, 2, 3, 1, 1, 3, 2,
3, 1, 1, 2, 1, 2, 1, 1, 0, 2, 2, 3, 3, 2, 1, 1, 3, 1, 3, 1, 1, 3,
1, 2, 0, 1, 2, 0, 2, 2, 2, 3, 1, 1, 2, 1, 0, 2, 2, 1, 1, 0, 2, 3,
3, 3, 3, 1, 1, 1, 1, 2, 3, 2, 1, 1, 1, 2, 2, 0, 3, 2, 1, 2, 3, 3,
2, 0, 3, 0, 1, 1, 1, 1, 2, 3, 3, 3, 2, 0, 3, 2, 3, 1, 3, 1, 2, 1,
2, 3, 2, 2, 3, 3, 1, 0, 3, 1, 3, 2, 2, 2, 2, 3, 3, 1, 3, 2, 3, 1,
3, 1, 2, 2, 1, 2, 3, 3, 1, 1, 2, 0, 2, 1, 2, 1, 3, 3, 3, 1, 3, 1,
1, 2, 3, 1, 1, 1, 2, 1, 2, 2, 1, 1, 2, 0, 2, 0, 3, 1, 2, 3, 1, 1,
3, 1, 3, 0, 3, 1, 3, 1, 1, 1, 1, 0, 3, 3, 2, 2, 3, 3, 1, 3, 1, 2,
1, 2, 2, 3, 2, 1, 2, 3, 3, 3, 3, 1, 2, 3, 1, 2, 1, 1, 1, 2, 1, 2,
3, 2, 1, 2, 1, 2, 1, 2, 3, 3, 1, 2, 0, 1, 2, 2, 2, 1, 1, 3, 3, 1,
3, 3, 2, 1, 3, 1, 3, 1, 1, 1, 3, 1, 3, 1, 2, 1, 0, 1, 2, 1, 2, 2,
1, 1, 2, 1, 2, 2, 2, 1, 3, 1, 2, 3, 2, 2, 3, 1, 2, 0, 0, 3, 2, 2,
2, 3, 2, 1, 1, 1, 1, 2, 2, 2, 1, 3, 1, 2, 1, 3, 2, 2, 1, 1, 1, 2,
3, 3, 2, 3, 2, 3, 1, 2, 2, 1, 2, 1, 1, 3, 3, 3, 2, 1, 1, 3, 2, 3,
3, 2, 1, 1, 1, 2, 3, 0, 1, 2, 1, 1, 2, 0, 2, 1, 0, 2, 0, 3, 2, 3,
2, 1, 1, 2, 3, 0, 0, 2, 2, 2, 1, 1, 1, 3, 1, 0, 1, 2, 2])
Give a good look at your code and error before posting a question. Then if that does not help, thoroughly read the documentation.
Keras fit() documentation -> What does .fit() return?
You have made a typo I believe. You expect the KerasClassifier object to have an attribute .history_. However this attribute is clearly None, looking at your error.
Be aware. Your typo will not help solving future requests, so it's better to look at existing questions in that case.
Existing question

PyTorch's DataLoader returning the same set of labels for each batch

I'm using PyTorch to train a model.
My validation_labels (ground truth labels) consists of the following values:
tensor([2, 0, 2, 2, 2, 0, 1, 1, 0, 2, 2, 0, 1, 2, 1, 2, 1, 1, 0, 1, 2, 2, 1, 2,
2, 2, 2, 1, 2, 1, 0, 2, 0, 2, 2, 2, 1, 2, 1, 1, 0, 0, 0, 0, 0, 2, 2, 2,
1, 1, 0, 2, 1, 0, 2, 2, 2, 2, 2, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 2, 2,
2, 2, 1, 2, 0, 2, 0, 1, 1, 2, 2, 0, 2, 2, 1, 1, 2, 0, 2, 2, 2, 2, 2, 0,
2, 2, 0, 0, 2, 1, 2, 2, 2, 2, 0, 0, 0, 1, 0, 2, 1, 2, 1, 2, 0, 2, 1, 2,
1, 0, 1, 2, 2, 2, 2, 0, 2, 1, 0, 2, 1, 2, 1, 1, 0, 1, 2, 2, 2, 2, 1, 0,
1, 1, 0, 2, 2, 1, 2, 2, 0, 1, 2, 0, 2, 0, 1, 1, 2, 0, 2, 0, 2, 2, 2, 2,
2, 1, 2, 2, 1, 0, 2, 1, 2, 2, 2, 2, 0, 2, 0, 0, 2, 1, 2, 0, 0, 2, 0, 2,
0, 0, 1, 1, 2, 2, 1, 2, 2, 1, 2, 2, 2, 0, 1, 2, 1, 2, 0, 0, 1, 1, 1, 2,
1, 2, 0, 0, 0, 0, 2, 2, 0, 0, 0, 2, 1, 0, 2, 1, 2, 2, 0, 2, 2, 0, 1, 0,
1, 2, 2, 2, 2, 0, 2, 2, 2, 2, 2, 2, 0, 0, 2, 0, 1, 0, 1, 2, 1, 0, 1, 2,
2, 2, 1, 2, 2, 2, 1, 0, 1, 2, 2, 0, 2, 2, 2, 0, 1, 2, 0, 2, 2, 0, 0, 1,
1, 1, 1, 1, 1, 2, 0, 2, 1, 0, 2, 1, 0, 2, 2, 2, 2, 2, 1, 1, 0, 2, 2, 2,
2, 2, 0, 2, 0, 2, 2, 2, 1, 1, 0, 2, 1, 0, 0, 2, 0, 2, 1, 2, 0, 2, 2, 1,
1, 1, 2, 2, 2, 0, 1, 0, 1, 2, 2, 2, 2, 2, 0, 1, 2, 0, 0, 0, 2, 1, 2, 0,
2, 1, 2, 1, 2, 2, 2, 0, 0, 2, 2, 2, 2, 0, 2, 0, 0, 2, 2, 1, 1, 2, 2, 2,
2, 0, 2, 2, 0, 2, 0, 1, 1, 0, 2, 0, 2, 1, 2, 2, 2, 2, 1, 1, 2, 2, 0, 0,
2, 2, 2, 2, 2, 0, 2, 2, 0, 1, 2, 2, 2, 2, 0, 2, 2, 2, 2, 0, 2, 1, 2, 1,
2, 2, 2, 2, 1, 1, 1, 0, 0, 1, 1, 2, 2, 2, 2, 2, 1, 2, 1, 1, 1, 0, 0, 0,
0, 1, 1, 0, 0], device='mps:0')
But, using the below code to generate a DataLoader results in all the validation_labels being converted to '2's.
validation_data = TensorDataset(validation_inputs, validation_masks, validation_labels)
validation_sampler = SequentialSampler(validation_data)
validation_dataloader = DataLoader(validation_data, sampler=validation_sampler, batch_size=batch_size)
for step, batch in enumerate(validation_dataloader):
batch = tuple(t.to(device) for t in batch)
eval_data, eval_masks, eval_labels = batch
print(eval_labels)
The eval labels get printed as:
tensor([2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2], device='mps:0')
tensor([2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2], device='mps:0')
tensor([2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2], device='mps:0')
tensor([2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2], device='mps:0')
tensor([2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2], device='mps:0')
tensor([2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2], device='mps:0')
tensor([2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2], device='mps:0')
tensor([2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2], device='mps:0')
tensor([2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2], device='mps:0')
tensor([2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2], device='mps:0')
tensor([2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2], device='mps:0')
tensor([2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2], device='mps:0')
tensor([2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2], device='mps:0')
tensor([2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2], device='mps:0')
tensor([2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2], device='mps:0')
Why are all the labels being changed to '2'? I'm not able to find out what is wrong with my code. Could someone tell me why this happens and what I should do about it?
This happened to me because the folder I was passing to the dataloder was the parent folder of the actual training data. i.e. Data was present in training/training. By removing the outer layer, the dataloder was able to read the labels correctly.

ImageDataGenerator.flow_from_directory is unable to one-hot encode data

I am using ImageDataGenerator (tensorflow version 2.5.0) to load in a number of jpg files that I am using for a classification system. I have specified the class_mode='categorical'. My images are originally RGB, but even though I am converting them to greyscale I don't think that should matter. However, when I call train_set.classes, the data I get is not one-hot encoded data, but it is sparse numerical data. Here is my ImageDataGenerator call:
def preprocessing_function(image):
neg = 1 - image
return neg
#image_path = sys.argv[1]
image_path = ''
train_datagen = ImageDataGenerator(
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
vertical_flip=True,
width_shift_range=0.2,
height_shift_range=0.2,
horizontal_flip=True,
preprocessing_function=preprocessing_function)
train_set = train_datagen.flow_from_directory(
os.path.join(image_path, 'endo_jpg/endo_256_2021_08_05/Training'),
target_size=(100,100),
batch_size=batch,
class_mode='categorical',
color_mode='grayscale')
Upon calling the flow_from_directory method, I am returned what I expect:
Found 625 images belonging to 4 classes.
Calling train_set.classes, I am returned a long list of integers, not one hot encoded data:
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3])
I can force the data to be one hot encoded by using:
train_set.classes = tensorflow.keras.utils.to_categorical(train_set.classes), but then I can't train with the data generator.
I think there is a problem with my specifying class_mode='categorical', but I have no idea why. I followed the example in the documentation (here), but calling categorical returns a sparse label.
Since you are using class_mode='categorical' you don't have to manually convert the labels to one hot encoded vectors using to_categorical().
The Generator will return labels as categorical automatically.
Simply calling train_set[0] clearly shows me the images and the labels. The printed labels are one hot encoded based on my code.

Convert a array List to a simple list

I try to use classification_report from sklearn.metrics:
sklearn.metrics.classification_report(y_true, y_pred, labels=None, target_names=None, sample_weight=None, digits=2, output_dict=False)
As input for prediction and label i've got one list each with the following form:
for pred:
[array([0, 0, 0, 3, 0, 3, 2, 2, 1, 1, 2, 0, 2, 3, 0, 2, 2, 2, 2, 3, 3, 2,
2, 2, 2, 2, 2, 1, 0, 3, 2, 2, 0, 2, 2, 3, 2, 0, 0, 0, 0, 0, 2, 2,
2, 1, 0, 0, 0, 2, 2, 0, 2, 0, 2, 1, 0, 2, 2, 3, 0, 2, 0, 2])]
for true:
[array([2, 3, 3, 2, 3, 3, 2, 3, 2, 2, 3, 3, 3, 2, 3, 2, 3, 2, 3, 2, 3, 3,
2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 3, 3, 2, 2, 2, 2, 2, 3, 2,
2, 2, 3, 3, 3, 2, 2, 2, 2, 3, 3, 3, 2, 2, 3, 2, 2, 2, 2, 2])]
for the sklearn-function above i need a simple list. The array produces an error:
ValueError: multiclass-multioutput is not supported
I tried .tolist() already but didn't work for me.
I am searching a possibility to convert my array-list [?] to a simple list.
Thanks for your help.
Each of those objects is already a list, each of which contains a single element, which is an array.
To access the 1st element and convert it to a list, try something like:
x = [array([2, 3, 3, 2, 3, 3, 2, 3, 2, 2, 3, 3, 3, 2, 3, 2, 3, 2, 3, 2, 3, 3,
2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 3, 3, 2, 2, 2, 2, 2, 3, 2,
2, 2, 3, 3, 3, 2, 2, 2, 2, 3, 3, 3, 2, 2, 3, 2, 2, 2, 2, 2])]
x_list = list(x[0])
And x_list will contain the array element in list form.
Way 1: Just index the lists e.g. pred[0]
Code:
pred = [array([0, 0, 0, 3, 0, 3, 2, 2, 1, 1, 2, 0, 2, 3, 0, 2, 2, 2, 2, 3, 3, 2,
2, 2, 2, 2, 2, 1, 0, 3, 2, 2, 0, 2, 2, 3, 2, 0, 0, 0, 0, 0, 2, 2,
2, 1, 0, 0, 0, 2, 2, 0, 2, 0, 2, 1, 0, 2, 2, 3, 0, 2, 0, 2])]
test = [array([0, 0, 0, 3, 0, 3, 2, 2, 1, 1, 2, 0, 2, 3, 0, 2, 2, 2, 2, 3, 3, 2,
2, 2, 2, 2, 2, 1, 0, 3, 2, 2, 0, 2, 2, 3, 2, 0, 0, 0, 0, 0, 2, 2,
2, 1, 0, 0, 0, 2, 2, 0, 2, 0, 2, 1, 0, 2, 2, 3, 0, 2, 0, 2])]
classification_report(pred[0], test[0])
Way 2:
Reform it to match sklearn requirements:
pred = [array([0, 0, 0, 3, 0, 3, 2, 2, 1, 1, 2, 0, 2, 3, 0, 2, 2, 2, 2, 3, 3, 2,
2, 2, 2, 2, 2, 1, 0, 3, 2, 2, 0, 2, 2, 3, 2, 0, 0, 0, 0, 0, 2, 2,
2, 1, 0, 0, 0, 2, 2, 0, 2, 0, 2, 1, 0, 2, 2, 3, 0, 2, 0, 2])]
test = [array([0, 0, 0, 3, 0, 3, 2, 2, 1, 1, 2, 0, 2, 3, 0, 2, 2, 2, 2, 3, 3, 2,
2, 2, 2, 2, 2, 1, 0, 3, 2, 2, 0, 2, 2, 3, 2, 0, 0, 0, 0, 0, 2, 2,
2, 1, 0, 0, 0, 2, 2, 0, 2, 0, 2, 1, 0, 2, 2, 3, 0, 2, 0, 2])]
flat_pred = [item for sublist in pred for item in sublist]
flat_test = [item for sublist in test for item in sublist]
print(classification_report(flat_pred,flat_test))

join or merge values calculated on grouped pandas dataframe

I have a DataFrame containing density values. I'd like to group by the 'hour' value, bin the densities, and add a new column to my original df, containing the bin number. This is failing, however:
df = pd.DataFrame({
'hours': np.random.randint(0, 24, 10000),
'density' : np.random.sample(10000)})
def func(df):
""""calculates equal intervals of a series or array"""
intervals = pysal.esda.mapclassify.Equal_Interval(df.density, 5)
# yb is an ndarray containing the bin indices, 0 - 4 in this case
return intervals.yb
df['bins'] = df.groupby(df.hours).transform(func)
Gives AssertionError: length of join_axes must not be equal to 0
If I just group the object and apply the interval function, it looks like this:
grp = df.groupby(df.hours).apply(func)
grp
Out[106]:
hours
0 [2, 4, 3, 4, 0, 4, 2, 2, 0, 1, 0, 0, 2, 2, 0, ...
1 [4, 1, 0, 4, 0, 2, 2, 3, 2, 3, 0, 3, 4, 3, 2, ...
2 [4, 1, 0, 2, 3, 4, 1, 1, 0, 3, 4, 4, 2, 4, 0, ...
3 [3, 0, 0, 4, 0, 0, 0, 1, 2, 2, 0, 2, 2, 2, 1, ...
4 [0, 1, 1, 2, 1, 3, 1, 3, 2, 2, 1, 4, 0, 4, 2, ...
5 [2, 0, 2, 1, 3, 1, 1, 0, 4, 4, 2, 1, 4, 1, 2, ...
6 [1, 2, 3, 3, 3, 2, 4, 1, 2, 1, 2, 0, 3, 2, 0, ...
7 [3, 0, 3, 1, 3, 1, 2, 1, 4, 2, 1, 2, 1, 1, 1, ...
8 [0, 1, 4, 3, 0, 1, 0, 0, 1, 0, 2, 1, 0, 1, 1, ...
9 [4, 2, 0, 4, 1, 3, 2, 3, 4, 1, 1, 4, 4, 4, 4, ...
10 [4, 4, 3, 3, 1, 2, 3, 0, 2, 4, 2, 4, 0, 2, 2, ...
11 [0, 1, 3, 0, 1, 1, 1, 1, 2, 1, 2, 0, 3, 3, 4, ...
12 [3, 1, 1, 0, 4, 4, 3, 0, 1, 2, 1, 1, 4, 2, 0, ...
13 [1, 1, 0, 2, 0, 1, 4, 1, 2, 2, 3, 1, 2, 0, 3, ...
14 [2, 4, 0, 2, 1, 2, 0, 4, 4, 2, 3, 4, 2, 1, 1, ...
15 [2, 4, 3, 4, 1, 0, 3, 1, 2, 0, 3, 4, 2, 2, 3, ...
16 [0, 4, 2, 3, 3, 4, 0, 3, 2, 0, 1, 0, 0, 2, 0, ...
17 [3, 1, 4, 4, 0, 4, 1, 0, 4, 3, 3, 2, 3, 1, 4, ...
18 [4, 3, 0, 2, 4, 2, 2, 0, 2, 2, 1, 2, 1, 0, 1, ...
19 [3, 0, 3, 1, 1, 0, 1, 1, 3, 3, 2, 3, 4, 0, 0, ...
20 [3, 0, 1, 4, 0, 0, 4, 2, 4, 2, 2, 0, 4, 0, 0, ...
21 [4, 2, 3, 3, 1, 2, 0, 4, 2, 0, 2, 2, 1, 2, 2, ...
22 [0, 4, 1, 1, 3, 1, 4, 1, 3, 4, 4, 0, 4, 4, 4, ...
23 [4, 1, 2, 0, 2, 0, 0, 0, 2, 3, 1, 1, 3, 0, 1, ...
dtype: object
Is there a standard way to join or merge values calculated from a grouped object, or should I be using transform differently?
Try to transform on column like this -
df['bins'] = df.groupby(df.hours).density.transform(func)
Note: func needs to be changed to receive Series as arg

Categories

Resources