ValueError: For multi-metric scoring, the parameter refit must be set to a scorer key or a callable

(4) 2024-05-27 18:23

Hi,大家好,我是编程小6,很荣幸遇见你,我把这些年在开发过程中遇到的问题或想法写出来,今天说一说ValueError: For multi-metric scoring, the parameter refit must be set to a scorer key or a callable,希望能够帮助你!!!。

ValueError: For multi-metric scoring, the parameter refit must be set to a scorer key or a callable to refit an estimator with the best parameter setting on the whole data and make the best_* attributes available for that metric. If this is not needed, refit should be set to False explicitly. True was passed.

问题:

因为当评估指标有多个的时候,模型不知道自己在refit的时候应该依据哪一个所以需要人为的进行指定才可以。

clf = GridSearchCV(log_lr, parameters, cv=5, scoring=scoring)

import numpy as np
from sklearn import linear_model, datasets
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import log_loss, make_scorer
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

iris = datasets.load_iris()
X = iris.data
# 将原始数据的类别处理为二分类问题,原始类别为0,1,2,现在为0,1
y = np.where(iris.target==0,0,1)
# 数据划分
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42,shuffle=True, stratify=y)

# 优化搜索函数;
alphas = np.logspace(1, 10, 100, base = 10)
parameters = {'C':[1, 10],'solver':('liblinear','saga')}
# parameters = {'C':alphas}
# 构建logisitic回归模型,选择L1正则化,
log_lr = linear_model.LogisticRegression(penalty='l1',max_iter=1e5,solver = 'liblinear')
# 构建logit损失函数;
LogLoss = make_scorer(log_loss, greater_is_better=False, needs_proba=True)
# GridSearchCV
scoring = {'AUC': 'roc_auc', 'LogLoss': LogLoss}
# clf = GridSearchCV(log_lr, parameters, cv=5, scoring=LogLoss)
# clf = GridSearchCV(log_lr, parameters, cv=5)
clf = GridSearchCV(log_lr, parameters, cv=5, scoring=scoring)
# clf = GridSearchCV(log_lr, parameters, cv=5, scoring=scoring,refit='AUC')
# 模型拟合
clf.fit(X_train, y_train)
print(clf.best_score_, clf.best_estimator_)
iris_model = clf.best_estimator_
# 查看 classification report
print('---------------classification report-------------------')
y_pred = iris_model.predict(X_test)
print(classification_report(y_test, y_pred))

解决:

clf = GridSearchCV(log_lr, parameters, cv=5, scoring=scoring,refit='AUC')
 

import numpy as np
from sklearn import linear_model, datasets
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import log_loss, make_scorer
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

iris = datasets.load_iris()
X = iris.data
# 将原始数据的类别处理为二分类问题,原始类别为0,1,2,现在为0,1
y = np.where(iris.target==0,0,1)
# 数据划分
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42,shuffle=True, stratify=y)

# 优化搜索函数;
alphas = np.logspace(1, 10, 100, base = 10)
parameters = {'C':[1, 10],'solver':('liblinear','saga')}
# parameters = {'C':alphas}
# 构建logisitic回归模型,选择L1正则化,
log_lr = linear_model.LogisticRegression(penalty='l1',max_iter=1e5,solver = 'liblinear')
# 构建logit损失函数;
LogLoss = make_scorer(log_loss, greater_is_better=False, needs_proba=True)
# GridSearchCV
scoring = {'AUC': 'roc_auc', 'LogLoss': LogLoss}
# clf = GridSearchCV(log_lr, parameters, cv=5, scoring=LogLoss)
# clf = GridSearchCV(log_lr, parameters, cv=5)
# clf = GridSearchCV(log_lr, parameters, cv=5, scoring=scoring)
clf = GridSearchCV(log_lr, parameters, cv=5, scoring=scoring,refit='AUC')
# 模型拟合
clf.fit(X_train, y_train)
print(clf.best_score_, clf.best_estimator_)
iris_model = clf.best_estimator_
# 查看 classification report
print('---------------classification report-------------------')
y_pred = iris_model.predict(X_test)
print(classification_report(y_test, y_pred))

完整错误:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-70-e7a5d74dc020> in <module>
     27 clf = GridSearchCV(log_lr, parameters, cv=5, scoring=scoring)
     28 # 模型拟合
---> 29 clf.fit(X_train, y_train)
     30 print(clf.best_score_, clf.best_estimator_)
     31 iris_model = clf.best_estimator_

D:\anaconda\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
     61             extra_args = len(args) - len(all_args)
     62             if extra_args <= 0:
---> 63                 return f(*args, **kwargs)
     64 
     65             # extra_args > 0

D:\anaconda\lib\site-packages\sklearn\model_selection\_search.py in fit(self, X, y, groups, **fit_params)
    754         else:
    755             scorers = _check_multimetric_scoring(self.estimator, self.scoring)
--> 756             self._check_refit_for_multimetric(scorers)
    757             refit_metric = self.refit
    758 

D:\anaconda\lib\site-packages\sklearn\model_selection\_search.py in _check_refit_for_multimetric(self, scores)
    719         if (self.refit is not False and not valid_refit_dict
    720                 and not callable(self.refit)):
--> 721             raise ValueError(multimetric_refit_msg)
    722 
    723     @_deprecate_positional_args

ValueError: For multi-metric scoring, the parameter refit must be set to a scorer key or a callable to refit an estimator with the best parameter setting on the whole data and make the best_* attributes available for that metric. If this is not needed, refit should be set to False explicitly. True was passed.


On GridSearchCV's doc, refit is defined as:

refit : boolean, string, or callable, default=True

Refit an estimator using the best found parameters on the whole dataset. For multiple metric evaluation, this needs to be a string denoting the scorer that would be used to find the best parameters for refitting the estimator at the end. Where there are considerations other than maximum score in choosing a best estimator, refit can be set to a function which returns the selected best_index_ given cv_results_. The refitted estimator is made available at the best_estimator_ attribute and permits using predict directly on this GridSearchCV instance. Also for multiple metric evaluation, the attributes best_index_, best_score_ and best_params_ will only be available if refit is set and all of them will be determined w.r.t this specific scorer. best_score_ is not returned if refit is callable. See scoring parameter to know more about multiple metric evaluation.

If you don't want to refit the estimator, you can set refit=False (as boolean). On the other hand, to refit the estimator with one of the scorer, you can do refit='precision_score' for example.

参考:How to fix the error “For multi-metric scoring” for OneClassSVM and GridSearchCV
参考:GridSearchCV

今天的分享到此就结束了,感谢您的阅读,如果确实帮到您,您可以动动手指转发给其他人。

上一篇

已是最后文章

下一篇

已是最新文章

发表回复