Showing Value Error while logistic regression fit - python

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=0)
log=LogisticRegression()
print (x_train.shape) --(5, 13)
print (x_test.shape) --(3, 13)
print(y_train.shape) --(5,)
print(y_test.shape) --(3,)
log.fit(x_train,y_train)
please see the below
I have followed from youtube and internet sources for the code and with the above code it is giving following error .Please help me out
error :
ValueError Traceback (most recent call last)
<ipython-input-16-86c1075a1e93> in <module>
----> 1 log.fit(x_train,y_train)
/srv/conda/lib/python3.6/site-packages/sklearn/linear_model/logistic.py in fit(self, X, y, sample_weight)
1287 X, y = check_X_y(X, y, accept_sparse='csr', dtype=_dtype, order="C",
1288 accept_large_sparse=solver != 'liblinear')
-> 1289 check_classification_targets(y)
1290 self.classes_ = np.unique(y)
1291 n_samples, n_features = X.shape
/srv/conda/lib/python3.6/site-packages/sklearn/utils/multiclass.py in check_classification_targets(y)
169 if y_type not in ['binary', 'multiclass', 'multiclass-multioutput',
170 'multilabel-indicator', 'multilabel-sequences']:
--> 171 raise ValueError("Unknown label type: %r" % y_type)
172
173
ValueError: Unknown label type: 'continuous'

Logistic regression is a statistical method for predicting binary classes. The dependent variable or target variable must be binary. In your case, you have "continuous" targets.
Types of Logistic Regression:
Binary Logistic Regression: The target variable has only two possible outcomes.
Multinomial Logistic Regression: The target variable has three or more nominal categories
Ordinal Logistic Regression: the target variable has three or more ordinal categories (Example: product rating from 1 to 5)

Related

How to generate confusion matrix?

I have a school projects with deep learning face recognition. I need reciprocal matrix to measure performance metrics like accuracy, precision. I tried the following codes for this. However, the y_test parameter gives an error. How can I solve this?
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(img_array, img_labels,
shuffle=True, stratify=img_labels,
test_size=0.1, random_state=42)
print('Eğitim için eleman sayısı, yükseklik/genişlik ve kanal sayısı: ', x_train.shape)
print('Test için eleman sayısı, yükseklik/genişlik ve kanal sayısı: : ',x_test.shape)
print('Eğitimdeki örnek ve sınıf sayısı :', y_train.shape)
print('Testteki örnek ve sınıf sayısı : ',y_test.shape)
my code
cm = confusion_matrix(y_test, y_pred)
print(cm)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [55], in <cell line: 1>()
----> 1 cm = confusion_matrix(y_test, y_pred)
2 print(cm)
File ~\anaconda3\lib\site-packages\sklearn\metrics\_classification.py:307, in confusion_matrix(y_true, y_pred, labels, sample_weight, normalize)
222 def confusion_matrix(
223 y_true, y_pred, *, labels=None, sample_weight=None, normalize=None
224 ):
225 """Compute confusion matrix to evaluate the accuracy of a classification.
226
227 By definition a confusion matrix :math:`C` is such that :math:`C_{i, j}`
(...)
305 (0, 2, 1, 1)
306 """
--> 307 y_type, y_true, y_pred = _check_targets(y_true, y_pred)
308 if y_type not in ("binary", "multiclass"):
309 raise ValueError("%s is not supported" % y_type)
File ~\anaconda3\lib\site-packages\sklearn\metrics\_classification.py:93, in _check_targets(y_true, y_pred)
90 y_type = {"multiclass"}
92 if len(y_type) > 1:
---> 93 raise ValueError(
94 "Classification metrics can't handle a mix of {0} and {1} targets".format(
95 type_true, type_pred
96 )
97 )
99 # We can't have more than one value on y_type => The set is no more needed
100 y_type = y_type.pop()
ValueError: Classification metrics can't handle a mix of multilabel-indicator and continuous-multioutput targets
I know I should not be providing this in the answer but am not able to add comments right now.
The classification report expects both the y_pred and y_test to be a 1-D array with class labels as integers.
The prediction of the TensorFlow model is mostly a 2D array with each entry being 1D array with class probability for a given row. So, you need to do some preprocessing on the y_pred.
I came across something similar a few weeks ago, I am gonna share a few lines of codes that may be helpful.
res = np.array(res)
res = res.flatten()
res = np.round(res)
Please note that the above code is for binary classification. For multilabel classification, you may use np.argmax.

Unknown label type: 'continuous' while using random forest classifier on a multi class classification problem

My code:
rf_classifier = RandomForestClassifier(n_estimators=600, min_samples_split=25)
rf_classifier.fit(combined_x_train, y_train)
The error:
ValueError Traceback (most recent call last)
<ipython-input-55-3f817939cbaa> in <module>
1 rf_classifier = RandomForestClassifier(n_estimators=600, min_samples_split=25)
----> 2 rf_classifier.fit(combined_x_train, y_train)
3
~\AppData\Roaming\Python\Python39\site-packages\sklearn\ensemble\_forest.py in fit(self, X, y, sample_weight)
329 self.n_outputs_ = y.shape[1]
330
--> 331 y, expanded_class_weight = self._validate_y_class_weight(y)
332
333 if getattr(y, "dtype", None) != DOUBLE or not y.flags.contiguous:
~\AppData\Roaming\Python\Python39\site-packages\sklearn\ensemble\_forest.py in _validate_y_class_weight(self, y)
557
558 def _validate_y_class_weight(self, y):
--> 559 check_classification_targets(y)
560
561 y = np.copy(y)
~\AppData\Roaming\Python\Python39\site-packages\sklearn\utils\multiclass.py in check_classification_targets(y)
181 if y_type not in ['binary', 'multiclass', 'multiclass-multioutput',
182 'multilabel-indicator', 'multilabel-sequences']:
--> 183 raise ValueError("Unknown label type: %r" % y_type)
184
185
ValueError: Unknown label type: 'continuous'
y_train is a NumPy array with values between 0 and 5 for multi-class classification with each class corresponding to an integer number.
The type of y_train is int32.
I don't understand why I am getting this error.
This problem might occur when y_train is not of the type that is inputted into any classifier model. In this case, y_train was of the type Series. When I changed the type to a NumPy array, it worked fine.
Here's the code:
y_train = y_train.to_numpy(dtype="int")
y_test = y_test.to_numpy(dtype="int")

Pandas returns this: ValueError: Unknown label type: 'continuous'

I am having trouble when using pandas and sklearn for machine learning. My problem is
ValueError: Unknown label type: 'continuous'
I tried
model = sklearn.tree.DecisionTreeClassifier()
model.fit(X, y)
and it returns this error:
ValueError Traceback (most recent call last)
<ipython-input-45-3caead2f350b> in <module>
----> 1 model.fit(ninp, out)
c:\users\user\appdata\local\programs\python\python38-32\lib\site-packages\sklearn\tree\_classes.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
888 """
889
--> 890 super().fit(
891 X, y,
892 sample_weight=sample_weight,
c:\users\user\appdata\local\programs\python\python38-32\lib\site-packages\sklearn\tree\_classes.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
179
180 if is_classification:
--> 181 check_classification_targets(y)
182 y = np.copy(y)
183
c:\users\user\appdata\local\programs\python\python38-32\lib\site-packages\sklearn\utils\multiclass.py in check_classification_targets(y)
170 if y_type not in ['binary', 'multiclass', 'multiclass-multioutput',
171 'multilabel-indicator', 'multilabel-sequences']:
--> 172 raise ValueError("Unknown label type: %r" % y_type)
173
174
ValueError: Unknown label type: 'continuous'
A classifier classifies a set of examples into discrete classes (i.e. it assigns a label corresponding to one of K classes). If your target (the content of your y variable) is continuous (for example a float ranging between 0 and 1), then the decision tree does not know what to do with it.
You have 2 solutions:
Your problem is a classification task and you need to model your target variable so that it represents categories and not a continuous variable
Your problem is not a classification task, it is a regression task and you need to use the appropriate models (e.g. DecisionTreeRegressor)

ValueError: Unknown label type: 'unknown' while using KNN

I am new to python and trying to run KNN but when I input the code, I get the error ValueError: Unknown label type:'unknown'.
I have encoded all the categorical data and dropped the ones I don't need to avoid dummy trapping.
What else do I need to do to clear this?
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import fbeta_score
training_accuracy = []
test_accuracy = []
neighbors_settings = range(1, 11)
for n_neighbors in neighbors_settings:
knn = KNeighborsClassifier(n_neighbors=n_neighbors)
knn.fit(x_train, y_train)
train_pred=knn.predict(x_train)
test_pred=knn.predict(x_test)
training_accuracy.append(fbeta_score(y_train, train_pred, beta=1))
test_accuracy.append(fbeta_score(y_test, test_pred, beta=1))
plt.plot(neighbors_settings, training_accuracy, label="training accuracy")
plt.plot(neighbors_settings, test_accuracy, label="test accuracy")
plt.ylabel("Accuracy")
plt.xlabel("n_neighbors")
plt.legend()
plt.savefig('knn_compare_model')
I expect a graph to show the test and training accuracy but I get this below;
ValueError Traceback (most recent call last)
<ipython-input-22-8a3a1f3c5c24> in <module>
11 # build the model
12 knn = KNeighborsClassifier(n_neighbors=n_neighbors)
---> 13 knn.fit(x_train, y_train)
>
14
15 # if accuracy of prediction on training set is high but it is low
on test set: So overfitting
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\neighbors\base.py in fit(self, X, y)
903 self.outputs_2d_ = True
904
--> 905 check_classification_targets(y)
906 self.classes_ = []
907 self._y = np.empty(y.shape, dtype=np.int)
>
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\utils\multiclass.py in
check_classification_targets(y)
169 if y_type not in ['binary', 'multiclass', 'multiclass- multioutput',
>
170 'multilabel-indicator', 'multilabel-sequences']:
--> 171 raise ValueError("Unknown label type: %r" % y_type)
>
172
173
ValueError: Unknown label type: 'unknown'
Your y_train could be of object type which could cause this error, so kindly add the line
y_train = y_train.astype('int')
before
knn.fit(x_train, y_train)
Also do the same with your y_test.

sklearn 0.14.1 RBM dies on NaN or Inf where there is none

I'm borrowing an idea here from the documentation to use RBMs + Logistic regression for classification.
However I'm getting an error that should not be thrown since all entries in my data matrix are numerical.
Code:
from sklearn import preprocessing, cross_validation
from scipy.ndimage import convolve
from sklearn.neural_network import BernoulliRBM
from sklearn.pipeline import Pipeline
from sklearn import linear_model, datasets, metrics
import numpy as np
# create fake dataset
data, labels = datasets.make_classification(n_samples=250000)
data = preprocessing.scale(data)
X_train, X_test, y_train, y_test = cross_validation.train_test_split(data, labels, test_size=0.7, random_state=0)
# print details
print X_train.shape, X_test.shape, y_train.shape, y_test.shape
print np.max(X_train)
print np.min(X_train)
print np.mean(X_train, axis=0)
print np.std(X_train, axis=0)
if np.sum(np.isnan(X_train)) or np.sum(np.isnan(X_test)):
print "NaN found!"
if np.sum(np.isnan(y_train)) or np.sum(np.isnan(y_test)):
print "NaN found!"
if np.sum(np.isinf(X_train)) or np.sum(np.isinf(X_test)):
print "Inf found!"
if np.sum(np.isinf(y_train)) or np.sum(np.isinf(y_test)):
print "Inf found!"
# train and test
logistic = linear_model.LogisticRegression()
rbm = BernoulliRBM(random_state=0, verbose=True)
classifier = Pipeline(steps=[('rbm', rbm), ('logistic', logistic)])
# Training RBM-Logistic Pipeline
classifier.fit(X_train, y_train)
# Training Logistic regression
logistic_classifier = linear_model.LogisticRegression(C=100.0)
logistic_classifier.fit(X_train, y_train)
print("Logistic regression using RBM features:\n%s\n" % (
metrics.classification_report(
y_test,
classifier.predict(X_test))))
Ouput:
(73517, 3) (171540, 3) (73517,) (171540,)
2.0871168057
-2.21062647188
[-0.00237028 -0.00104526 0.00330683]
[ 0.99907225 0.99977328 1.00225843]
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/usr/lib/python2.7/dist-packages/IPython/utils/py3compat.pyc in execfile(fname, *where)
173 else:
174 filename = fname
--> 175 __builtin__.execfile(filename, *where)
/home/test.py in <module>()
75
76 # Training RBM-Logistic Pipeline
---> 77 classifier.fit(X_train, y_train)
78
79 # Training Logistic regression
/usr/local/lib/python2.7/dist-packages/sklearn/pipeline.pyc in fit(self, X, y, **fit_params)
128 data, then fit the transformed data using the final estimator.
129 """
--> 130 Xt, fit_params = self._pre_transform(X, y, **fit_params)
131 self.steps[-1][-1].fit(Xt, y, **fit_params)
132 return self
/usr/local/lib/python2.7/dist-packages/sklearn/pipeline.pyc in _pre_transform(self, X, y, **fit_params)
118 for name, transform in self.steps[:-1]:
119 if hasattr(transform, "fit_transform"):
--> 120 Xt = transform.fit_transform(Xt, y, **fit_params_steps[name])
121 else:
122 Xt = transform.fit(Xt, y, **fit_params_steps[name]) \
/usr/local/lib/python2.7/dist-packages/sklearn/base.pyc in fit_transform(self, X, y, **fit_params)
409 else:
410 # fit method of arity 2 (supervised transformation)
--> 411 return self.fit(X, y, **fit_params).transform(X)
412
413
/usr/local/lib/python2.7/dist-packages/sklearn/neural_network/rbm.pyc in fit(self, X, y)
304
305 for batch_slice in batch_slices:
--> 306 pl_batch = self._fit(X[batch_slice], rng)
307
308 if verbose:
/usr/local/lib/python2.7/dist-packages/sklearn/neural_network/rbm.pyc in _fit(self, v_pos, rng)
245
246 if self.verbose:
--> 247 return self.score_samples(v_pos)
248
249 def score_samples(self, v):
/usr/local/lib/python2.7/dist-packages/sklearn/neural_network/rbm.pyc in score_samples(self, v)
268 fe_ = self._free_energy(v_)
269
--> 270 return v.shape[1] * logistic_sigmoid(fe_ - fe, log=True)
271
272 def fit(self, X, y=None):
/usr/local/lib/python2.7/dist-packages/sklearn/utils/extmath.pyc in logistic_sigmoid(X, log, out)
498 """
499 is_1d = X.ndim == 1
--> 500 X = array2d(X, dtype=np.float)
501
502 n_samples, n_features = X.shape
/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.pyc in array2d(X, dtype, order, copy, force_all_finite)
91 X_2d = np.asarray(np.atleast_2d(X), dtype=dtype, order=order)
92 if force_all_finite:
---> 93 _assert_all_finite(X_2d)
94 if X is X_2d and copy:
95 X_2d = safe_copy(X_2d)
/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.pyc in _assert_all_finite(X)
25 if (X.dtype.char in np.typecodes['AllFloat'] and not np.isfinite(X.sum())
26 and not np.isfinite(X).all()):
---> 27 raise ValueError("Array contains NaN or infinity.")
28
29
ValueError: Array contains NaN or infinity.
There are no infs or nans in the data matrix...what could be causing this behaviour?
EDIT: Apparently I'm not the only one.
This looks like a numerical stability bug in RBMs. Can you please open a github issue with your script in it?
Edit: by the way if you are interested you can try to find the source of the issue by adding np.isfinite() checks in the inner loops of the _fit method of the BernoulliRBM class.
This issue is usually caused by two factors. Incorrect initial scaling of the data. Firstly the input data needs to be bound between 0 and 1. Remember RBM's were originally designed for binary data only. Secondly the learning rates could be too high. Defaults for RBM code are often based on the MNIST digit recognition dataset which can handle larger learning rates.
So I would trust sklearn's implementation, but not the stability of the algorithm for a new dataset based on default values that don't fit with the current dataset. Adding checks for infinity wont help you will still need to tweak the learning rates.
This is why deep learning is said to be a bit of art, you probably also need to play around with the number of gibs samples, size of minibatch and amount of momentum. Dont give up though, the rewards are mostly worth it. Further reading

Categories

Resources