本文是一個使用交叉驗證來評估分類器性能的ROC度量指標的示例。ROC曲線通常在Y軸上具有true positive率,在X軸上具有false positive率,這意味著圖的左上角是「理想」點-false positive率為零,而true positive率為1,雖然這不是很現實,但這確實意味著曲線下擁有較大區域(AUC)的模型的性能通常更好。
ROC曲線的「陡度」也很重要,因為我們想要最大程度地提高true positive率,同時最小化false positive率。
本示例顯示了通過K-fold交叉驗證在不同數據集上的ROC響應。對於所有的這些曲線,可以計算出曲線下的平均面積,並且在訓練集分為不同子集時,查看曲線的方差。這顯示了分類器的輸出如何受到訓練數據變化的影響,以及通過K-fold交叉驗證生成的子集之間的差異。
注意:
另請參閱sklearn.metrics.roc_auc_score,sklearn.model_selection.cross_val_score,接收器工作特性(ROC),
sphx_glr_plot_roc_crossval_001輸出:
/home/circleci/project/examples/model_selection/plot_roc_crossval.py:75: DeprecationWarning: scipy.interp is deprecated and will be removed in SciPy 2.0.0, use numpy.interp instead
interp_tpr = interp(mean_fpr, viz.fpr, viz.tpr)
/home/circleci/project/examples/model_selection/plot_roc_crossval.py:75: DeprecationWarning: scipy.interp is deprecated and will be removed in SciPy 2.0.0, use numpy.interp instead
interp_tpr = interp(mean_fpr, viz.fpr, viz.tpr)
/home/circleci/project/examples/model_selection/plot_roc_crossval.py:75: DeprecationWarning: scipy.interp is deprecated and will be removed in SciPy 2.0.0, use numpy.interp instead
interp_tpr = interp(mean_fpr, viz.fpr, viz.tpr)
/home/circleci/project/examples/model_selection/plot_roc_crossval.py:75: DeprecationWarning: scipy.interp is deprecated and will be removed in SciPy 2.0.0, use numpy.interp instead
interp_tpr = interp(mean_fpr, viz.fpr, viz.tpr)
/home/circleci/project/examples/model_selection/plot_roc_crossval.py:75: DeprecationWarning: scipy.interp is deprecated and will be removed in SciPy 2.0.0, use numpy.interp instead
interp_tpr = interp(mean_fpr, viz.fpr, viz.tpr)
/home/circleci/project/examples/model_selection/plot_roc_crossval.py:75: DeprecationWarning: scipy.interp is deprecated and will be removed in SciPy 2.0.0, use numpy.interp instead
interp_tpr = interp(mean_fpr, viz.fpr, viz.tpr)
print(__doc__)
import numpy as np
from scipy import interp
import matplotlib.pyplot as plt
from sklearn import svm, datasets
from sklearn.metrics import auc
from sklearn.metrics import plot_roc_curve
from sklearn.model_selection import StratifiedKFold
# #############################################################################
# 數據IO和生成
# 導入一些數據進行訓練
iris = datasets.load_iris()
X = iris.data
y = iris.target
X, y = X[y != 2], y[y != 2]
n_samples, n_features = X.shape
# 添加噪聲特徵
random_state = np.random.RandomState(0)
X = np.c_[X, random_state.randn(n_samples, 200 * n_features)]
# #############################################################################
# 分類和ROC分析
# 使用交叉驗證運行分類器並繪製ROC曲線
cv = StratifiedKFold(n_splits=6)
classifier = svm.SVC(kernel='linear', probability=True,
random_state=random_state)
tprs = []
aucs = []
mean_fpr = np.linspace(0, 1, 100)
fig, ax = plt.subplots()
for i, (train, test) in enumerate(cv.split(X, y)):
classifier.fit(X[train], y[train])
viz = plot_roc_curve(classifier, X[test], y[test],
name='ROC fold {}'.format(i),
alpha=0.3, lw=1, ax=ax)
interp_tpr = interp(mean_fpr, viz.fpr, viz.tpr)
interp_tpr[0] = 0.0
tprs.append(interp_tpr)
aucs.append(viz.roc_auc)
ax.plot([0, 1], [0, 1], linestyle='--', lw=2, color='r',
label='Chance', alpha=.8)
mean_tpr = np.mean(tprs, axis=0)
mean_tpr[-1] = 1.0
mean_auc = auc(mean_fpr, mean_tpr)
std_auc = np.std(aucs)
ax.plot(mean_fpr, mean_tpr, color='b',
label=r'Mean ROC (AUC = %0.2f $\pm$ %0.2f)' % (mean_auc, std_auc),
lw=2, alpha=.8)
std_tpr = np.std(tprs, axis=0)
tprs_upper = np.minimum(mean_tpr + std_tpr, 1)
tprs_lower = np.maximum(mean_tpr - std_tpr, 0)
ax.fill_between(mean_fpr, tprs_lower, tprs_upper, color='grey', alpha=.2,
label=r'$\pm$ 1 std. dev.')
ax.set(xlim=[-0.05, 1.05], ylim=[-0.05, 1.05],
title="Receiver operating characteristic example")
ax.legend(loc="lower right")
plt.show()
腳本的總運行時間:(0分鐘0.674秒)
估計的內存使用量: 8 MB
下載Python原始碼: plot_roc_crossval.py
下載Jupyter notebook原始碼: plot_roc_crossval.ipynb
由Sphinx-Gallery生成的畫廊
☆☆☆為方便大家查閱,小編已將scikit-learn學習路線專欄文章統一整理到公眾號底部菜單欄,同步更新中,關注公眾號,在上方向下「系列文章」,如下:(添加微信:mthler,備註:sklearn學習,一起進入【sklearn機器學習進步群】開啟打怪升級的學習之旅。