How compute weighted accuracy for multi-class classification? - python

I do multi-class classification on unbalanced classes. I'm using SGDClassifier(), GradientBoostingClassifier(), RandomForestClassifier(), and LogisticRegression()with class_weight='balanced'. To compare the results. it is required to compute the accuracy. I tried the following way to compute weighted accuracy:
n_samples = len(y_train)
weights_cof = float(n_samples)/(n_classes*np.bincount(data[target_label].as_matrix().astype(int))[1:])
sample_weights = np.ones((n_samples,n_classes)) * weights_cof
print accuracy_score(y_test, y_pred, sample_weight=sample_weights)
y_train is a binary array. So sample_weights has the same shape as y_train (n_samples, n_classes). When I run the script, I received the following error:
Update:
Traceback (most recent call last):
File "C:\Program Files (x86)\JetBrains\PyCharm Community Edition 2016.3.2\helpers\pydev\pydevd.py", line 1596, in <module>
globals = debugger.run(setup['file'], None, None, is_module)
File "C:\Program Files (x86)\JetBrains\PyCharm Community Edition 2016.3.2\helpers\pydev\pydevd.py", line 974, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "D:/Destiny/DestinyScripts/MainLocationAware.py", line 424, in <module>
predict_country(featuresDF, score, featuresLabel, country_sample_size, 'gbc')
File "D:/Destiny/DestinyScripts/MainLocationAware.py", line 313, in predict_country
print accuracy_score(y_test, y_pred, sample_weight=sample_weights)
File "C:\ProgramData\Anaconda2\lib\site-packages\sklearn\metrics\classification.py", line 183, in accuracy_score
return _weighted_sum(score, sample_weight, normalize)
File "C:\ProgramData\Anaconda2\lib\site-packages\sklearn\metrics\classification.py", line 108, in _weighted_sum
return np.average(sample_score, weights=sample_weight)
File "C:\ProgramData\Anaconda2\lib\site-packages\numpy\lib\function_base.py", line 1124, in average
"Axis must be specified when shapes of a and weights "
TypeError: Axis must be specified when shapes of a and weights differ.

The error would seem to suggest that the shape of your sample_weights and your y_test/y_pred arrays differ. Basically the method creates a boolean array with y_test == y_pred and passes that along with sample_weights to np.average. One of the first checks in that method is to ensure that the entered array and the weights are the same shape, which apparently in this case they are not.
Update
Your comment "sample_weights, y_test, and y_pred have the same shape (n_samples, n_classes)" exposes the issue. According to the documentation for accuracy_score, y_pred and y_true (in your case y_test and y_pred) should be 1 dimensional. Are you perhaps using one hot encoded labels? If so you should convert them to single value labels and then try the accuracy score again.

Related

Unknown is not supported - f1 score

I want to do f1 score with 32 predicted masks images and 32 true masks images. My data has this features:
predicted.shape [32,512,512]
true.shape [32,512,512]
type_of_target(predicted) Unknown
type_of_target(true) Unknown
type_of_target(predicted[0]) Continuous-multioutput
type_of_target(true[0]) Continuous-multioutput
When I run this line f1_score(true, predicted, average='macro')
I get this error:
f1_score(true, predicted, average='macro')
Traceback (most recent call last):
File "<ipython-input-75-7198c91642b6>", line 1, in <module>
f1_score(true, predicted, average='macro')
File "C:\Anaconda3\lib\site-packages\sklearn\metrics\_classification.py", line 1099, in f1_score
zero_division=zero_division)
File "C:\Anaconda3\lib\site-packages\sklearn\metrics\_classification.py", line 1226, in fbeta_score
zero_division=zero_division)
File "C:\Anaconda3\lib\site-packages\sklearn\metrics\_classification.py", line 1484, in precision_recall_fscore_support
pos_label)
File "C:\Anaconda3\lib\site-packages\sklearn\metrics\_classification.py", line 1301, in _check_set_wise_labels
y_type, y_true, y_pred = _check_targets(y_true, y_pred)
File "C:\Anaconda3\lib\site-packages\sklearn\metrics\_classification.py", line 97, in _check_targets
raise ValueError("{0} is not supported".format(y_type))
ValueError: unknown is not supported
F1-Score is the Harmonic Mean of Precision and Recall. Precision and recall are calculated when the predicted values are categorical and not continuous outputs. You need to convert the predictions to categorical (by rounding up or rounding down) then flatten the array since the f1_score function only takes 1D-arrays as the input parameters.
I think F1 input should be 1d array (label).
Make sure of that.

Keras predict_classes method returns "list index out of range" error

I am new to CNN and machine learning in general and have been trying to follow Image Classification tutorial of TensorFlow.
Now, the Google Colab can be found here. I've followed the this official tutorial of TensorFlow. And I've changed it slightly so it saves the model as h5 instead of tf format so I can use the Keras' model.predict_classes.
Now, I have the model trained and the model reloaded from the saved model alright. But I'm repeatedly getting list index out of range error whenever I am trying to predict the image which I am doing so:
def predict():
image = tf.io.read_file('target.jpeg')
image = tf.image.decode_jpeg(image, channels=3)
image = tf.image.resize(image, [224, 224])
print(model.predict_classes(image)[0])
target.jpeg is one of the images I took from the flowers_photos dataset on which the model is trained.
The Traceback is:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 7, in predict
File "/home/amitjoki/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/sequential.py", line 319, in predict_classes
proba = self.predict(x, batch_size=batch_size, verbose=verbose)
File "/home/amitjoki/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 821, in predict
use_multiprocessing=use_multiprocessing)
File "/home/amitjoki/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 712, in predict
callbacks=callbacks)
File "/home/amitjoki/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 187, in model_iteration
f = _make_execution_function(model, mode)
File "/home/amitjoki/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 555, in _make_execution_function
return model._make_execution_function(mode)
File "/home/amitjoki/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 2037, in _make_execution_function
self._make_predict_function()
File "/home/amitjoki/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 2027, in _make_predict_function
**kwargs)
File "/home/amitjoki/.local/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 3544, in function
return EagerExecutionFunction(inputs, outputs, updates=updates, name=name)
File "/home/amitjoki/.local/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 3468, in __init__
self.outputs[0] = array_ops.identity(self.outputs[0])
IndexError: list index out of range
I searched extensively but couldn't get any solution. It would be helpful if anyone can point me in the direction of getting this up and running.
All the predict functions in Keras expect batch of inputs. Therefore, since you are doing prediction on one single image, you need to add an axis at the beginning of image tensor to represent the batch axis:
image = tf.expand_dims(image, axis=0) # the shape would be (1, 224, 224, 3)
print(model.predict_classes(image)[0])

Shaping data for linear regression with TFlearn

I'm trying to expand the tflearn example for linear regression by increasing the number of columns to 21.
from trafficdata import X,Y
import tflearn
print(X.shape) #(1054, 21)
print(Y.shape) #(1054,)
# Linear Regression graph
input_ = tflearn.input_data(shape=[None,21])
linear = tflearn.single_unit(input_)
regression = tflearn.regression(linear, optimizer='sgd', loss='mean_square',
metric='R2', learning_rate=0.01)
m = tflearn.DNN(regression)
m.fit(X, Y, n_epoch=1000, show_metric=True, snapshot_epoch=False)
print("\nRegression result:")
print("Y = " + str(m.get_weights(linear.W)) +
"*X + " + str(m.get_weights(linear.b)))
However, tflearn complains:
Traceback (most recent call last):
File "linearregression.py", line 16, in <module>
m.fit(X, Y, n_epoch=1000, show_metric=True, snapshot_epoch=False)
File "/usr/local/lib/python3.5/dist-packages/tflearn/models/dnn.py", line 216, in fit
callbacks=callbacks)
File "/usr/local/lib/python3.5/dist-packages/tflearn/helpers/trainer.py", line 339, in fit
show_metric)
File "/usr/local/lib/python3.5/dist-packages/tflearn/helpers/trainer.py", line 818, in _train
feed_batch)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 789, in run
run_metadata_ptr)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 975, in _run
% (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (64,) for Tensor 'TargetsData/Y:0', which has shape '(21,)'
I found the shape (64, ) comes from the default batch size of tflearn.regression().
Do I need to transform the labels (Y)? In what way?
Thanks!
I tried to do the same. I made these changes to get it to work
# linear = tflearn.single_unit(input_)
linear = tflearn.fully_connected(input_, 1, activation='linear')
My guess is that with features >1 you cannot use tflearn.single_unit(). You can add additional fully_connected layers, but the last one must have only 1 neuron because Y.shape=(?,1)
You have 21 features. Therefore, you cannot use linear = tflearn.single_unit(input_)
Instead try this: linear = tflearn.fully_connected(input_, 21, activation='linear')
The error you get is because your labels, i.e., Y has a shape of (1054,).
You have to first preprocess it.
Try using the code given below before # linear regression graph:
Y = np.expand_dims(Y,-1)
Now before regression = tflearn.regression(linear, optimizer='sgd', loss='mean_square',metric='R2', learning_rate=0.01), type the below code:
linear = tflearn.fully_connected(linear, 1, activation='linear')

TypError raised by Pandas module when fitting a Scikit-Learn GridSearchCV

Been plagued by this bug for a while now and could use some hive-mind help to (hopefully) catch something I'm missing.
tl;dr version
Pandas is raising a TypeError when I pass scikit-learn GridSearchCV ndarrays of floats to train on.
full version
I'm using an sklearn GridSearchCV object to fit a 1D list of float target variables scaled between 0.0 and 1.0 to a 2D list of float feature variables (input variables) also scaled between 0 and 1. Both lists have the same number of samples.
The abbreviated code where I pass my training data to GridSearchCV.fit() is as follows:
# the training data are attributes of a model class defined elsewhere
model.X_scaled # ndarray of feature data of shape (N_samples, N_features)
model.Y_scaled # ndarray target data of length N_samples
# setup the GridSearchCV instance
search = grid_search.GridSearchCV(estimator = svr,
param_grid = params, # C and epsilon parameters are set elsewhere in a dict
n_jobs = self.parallel_processes,
scoring = self.scoring_metric,
cv = np.shape(self.Y)[0], # sets the fold size for cross-val. cv = # of samples is essentially LOO CV.
verbose = 0)
# print out some info to get a better idea of the training data
print "model.X_scaled"
print type(model.X_scaled[0][0]) # this is getting the type of one of the elements of X_scaled
print np.shape(model.X_scaled)
print model.X_scaled.tolist()
print "model.Y_scaled"
print type(model.Y_scaled) # this is getting the type of the entire arry structure
print np.shape(model.Y_scaled)
print model.Y_scaled
# fit GridSearchCV on the training data
search.fit(model.X_scaled, model.Y_scaled)
When this code is run, I get the resulting output:
model.X_scaled
<type 'numpy.float64'>
(81, 16)
model.Y_scaled
<type 'numpy.ndarray'>
(81,)
Traceback (most recent call last):
File "SurrogateModel.py", line 416, in <module>
self_test()
File "SurrogateModel.py", line 409, in self_test
model.train()
File "SurrogateModel.py", line 350, in train
search.fit(model.X_scaled, model.Y_scaled)
File "/home/jack/Software/Python/anaconda2/envs/xs1opt/lib/python2.7/site-packages/sklearn/grid_search.py", line 804, in fit
return self._fit(X, y, ParameterGrid(self.param_grid))
File "/home/jack/Software/Python/anaconda2/envs/xs1opt/lib/python2.7/site-packages/sklearn/grid_search.py", line 553, in _fit
for parameters in parameter_iterable
File "/home/jack/Software/Python/anaconda2/envs/xs1opt/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 800, in __call__
while self.dispatch_one_batch(iterator):
File "/home/jack/Software/Python/anaconda2/envs/xs1opt/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 658, in dispatch_one_batch
self._dispatch(tasks)
File "/home/jack/Software/Python/anaconda2/envs/xs1opt/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 566, in _dispatch
job = ImmediateComputeBatch(batch)
File "/home/jack/Software/Python/anaconda2/envs/xs1opt/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 180, in __init__
self.results = batch()
File "/home/jack/Software/Python/anaconda2/envs/xs1opt/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 72, in __call__
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File "/home/jack/Software/Python/anaconda2/envs/xs1opt/lib/python2.7/site-packages/sklearn/cross_validation.py", line 1531, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "/home/jack/Software/Python/anaconda2/envs/xs1opt/lib/python2.7/site-packages/sklearn/svm/base.py", line 193, in fit
fit(X, y, sample_weight, solver_type, kernel, random_seed=seed)
File "/home/jack/Software/Python/anaconda2/envs/xs1opt/lib/python2.7/site-packages/sklearn/svm/base.py", line 251, in _dense_fit
max_iter=self.max_iter, random_seed=random_seed)
File "sklearn/svm/libsvm.pyx", line 59, in sklearn.svm.libsvm.fit (sklearn/svm/libsvm.c:1576)
File "/home/jack/Software/Python/anaconda2/envs/xs1opt/lib/python2.7/site-packages/pandas/core/series.py", line 78, in wrapper
"{0}".format(str(converter)))
TypeError: cannot convert the series to <type 'float'>
The stack trace is a bit confusing to me. sklearn makes it all the way to the "fit" call to libsvm, but then a Pandas package raises a TypeError because it can't convert a series object to a float? I looked in the pandas series.py module and found additional context for where the TypeError is being raised:
# in pandas/core/series.py
def _coerce_method(converter):
""" install the scalar coercion methods """
def wrapper(self):
if len(self) == 1:
return converter(self.iloc[0])
raise TypeError("cannot convert the series to "
"{0}".format(str(converter)))
return wrapper
I don't understand how this Pandas function is being called when neither of the data structures I'm passing to scikit-learn are pandas objects. In the code I've provided, they're both ndarrays, but I've tried passing them as plain lists only to get the same TypeError. Pandas is used earlier in the code to read data from a csv into a DataFrame, but the data is converted to ndarrays before being put into X_scaled and Y_scaled.
The annoying thing is that this very nearly exact same code runs perfectly fine in the script which preceded this one. The version of the code in which I have this problem is basically refactored from the script, but this functionality, training a gridsearch object on the training data, remains mostly unchanged.
Any suggestions on what might be happening here are greatly appreciated. Thank you!

Tensorflow error using my own data for text classification

I've been playing with the Tensorflow library doing the tutorials.
I'm using this example. And I changed the parameters in the example from this: n_classes = 15
to this: n_classes = 2 as I have only two classes to classify.
I read data like:
train = pandas.read_csv('tensorflow_feed/test/train_with_abs.csv', header=None)
X_train, y_train = train[1], train[0]
test = pandas.read_csv('tensorflow_feed/test/test_with_abs.csv', header=None)
X_test, y_test = test[1], test[0]
But it gives following error:
Total words: 35
Traceback (most recent call last):
File "/home/sumit/PycharmProjects/experiments/text_classification_save_restore.py", line 94, in <module>
classifier.fit(X_train, y_train)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/base.py", line 160, in fit
monitors=monitors)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 449, in _train_model
train_op, loss_op = self._get_train_ops(features, targets)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 673, in _get_train_ops
_, loss, train_op = self._call_model_fn(features, targets, ModeKeys.TRAIN)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 656, in _call_model_fn
features, targets, mode=mode)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/base.py", line 369, in _model_fn
predictions, loss = model_fn(features, targets)
File "/home/sumit/PycharmProjects/experiments/text_classification_save_restore.py", line 73, in rnn_model
word_list = tf.unpack(word_vectors, axis=1)
TypeError: unpack() got an unexpected keyword argument 'axis'
Process finished with exit code 1
The "axis" parameter was just added to tf.unpack on June 23, and the example you're looking at was changed to use it:
https://github.com/tensorflow/tensorflow/commit/eff93149a6dc8e6826898fd9f9c28c81e21c9836
So I suggest either:
use an older version of the example from before that commit, e.g.:
https://github.com/tensorflow/tensorflow/blob/892ca4ddc12852a7b4633fd08f163941356cb4e6/tensorflow/examples/skflow/text_classification_save_restore.py
build a newer Tensorflow from github HEAD.
I hope that helps!

Categories

Resources