• ALINK(三十六):模型评估(一)二分类评估 (EvalBinaryClassBatchOp)


    Java 类名:com.alibaba.alink.operator.batch.evaluation.EvalBinaryClassBatchOp

    Python 类名:EvalBinaryClassBatchOp

    功能介绍

    二分类评估是对二分类算法的预测结果进行效果评估。

    支持Roc曲线,LiftChart曲线,K-S曲线,Recall-Precision曲线绘制。

    流式的实验支持累计统计和窗口统计,除却上述四条曲线外,还给出Auc/Kappa/Accuracy/Logloss随时间的变化曲线。

    给出整体的评估指标包括:AUC、K-S、PRC, 不同阈值下的Precision、Recall、F-Measure、Sensitivity、Accuracy、Specificity和Kappa。

    混淆矩阵

    Roc曲线

    横坐标:FPR

    纵坐标:TPR

    AUC

    Roc曲线下面的面积

    K-S

    横坐标:阈值

    纵坐标:TPR和FPR

    KS

    K-S曲线两条纵轴的最大差值

    Recall-Precision曲线

    横坐标:Recall

    纵坐标:Precision

    PRC

    Recall-Precision曲线下面的面积

     

     

     

    参数说明

    名称

    中文名称

    描述

    类型

    是否必须?

    默认值

    predictionDetailCol

    预测详细信息列名

    预测详细信息列名

    String

     

    labelCol

    标签列名

    输入表中的标签列名

    String

     

    positiveLabelValueString

    正样本

    正样本对应的字符串格式。

    String

     

    null

    代码示例

    Python 代码

    from pyalink.alink import *
    import pandas as pd
    useLocalEnv(1)
    df = pd.DataFrame([
        ["prefix1", "{"prefix1": 0.9, "prefix0": 0.1}"],
        ["prefix1", "{"prefix1": 0.8, "prefix0": 0.2}"],
        ["prefix1", "{"prefix1": 0.7, "prefix0": 0.3}"],
        ["prefix0", "{"prefix1": 0.75, "prefix0": 0.25}"],
        ["prefix0", "{"prefix1": 0.6, "prefix0": 0.4}"]
    ])
    inOp = BatchOperator.fromDataframe(df, schemaStr='label string, detailInput string')
    metrics = EvalBinaryClassBatchOp().setLabelCol("label").setPredictionDetailCol("detailInput").linkFrom(inOp).collectMetrics()
    print("AUC:", metrics.getAuc())
    print("KS:", metrics.getKs())
    print("PRC:", metrics.getPrc())
    print("Accuracy:", metrics.getAccuracy())
    print("Macro Precision:", metrics.getMacroPrecision())
    print("Micro Recall:", metrics.getMicroRecall())
    print("Weighted Sensitivity:", metrics.getWeightedSensitivity())

    Java 代码

    import org.apache.flink.types.Row;
    import com.alibaba.alink.operator.batch.BatchOperator;
    import com.alibaba.alink.operator.batch.evaluation.EvalBinaryClassBatchOp;
    import com.alibaba.alink.operator.batch.source.MemSourceBatchOp;
    import com.alibaba.alink.operator.common.evaluation.BinaryClassMetrics;
    import org.junit.Test;
    import java.util.Arrays;
    import java.util.List;
    public class EvalBinaryClassBatchOpTest {
      @Test
      public void testEvalBinaryClassBatchOp() throws Exception {
        List <Row> df = Arrays.asList(
          Row.of("prefix1", "{"prefix1": 0.9, "prefix0": 0.1}"),
          Row.of("prefix1", "{"prefix1": 0.8, "prefix0": 0.2}"),
          Row.of("prefix1", "{"prefix1": 0.7, "prefix0": 0.3}"),
          Row.of("prefix0", "{"prefix1": 0.75, "prefix0": 0.25}"),
          Row.of("prefix0", "{"prefix1": 0.6, "prefix0": 0.4}")
        );
        BatchOperator <?> inOp = new MemSourceBatchOp(df, "label string, detailInput string");
        BinaryClassMetrics metrics = new EvalBinaryClassBatchOp().setLabelCol("label").setPredictionDetailCol(
          "detailInput").linkFrom(inOp).collectMetrics();
        System.out.println("AUC:" + metrics.getAuc());
        System.out.println("KS:" + metrics.getKs());
        System.out.println("PRC:" + metrics.getPrc());
        System.out.println("Accuracy:" + metrics.getAccuracy());
        System.out.println("Macro Precision:" + metrics.getMacroPrecision());
        System.out.println("Micro Recall:" + metrics.getMicroRecall());
        System.out.println("Weighted Sensitivity:" + metrics.getWeightedSensitivity());
      }
    }

    运行结果

    AUC: 0.8333333333333334
    KS: 0.6666666666666666
    PRC: 0.9027777777777777
    Accuracy: 0.6
    Macro Precision: 0.8
    Micro Recall: 0.6
    Weighted Sensitivity: 0.6
  • 相关阅读:
    hadoop2.2.0 centos6.4 编译安装详解
    Hadoop 2.2.0的高可用性集群中遇到的一些问题(64位)
    Visual Studio 常用快捷键
    Android(1)—Mono For Android 环境搭建及破解
    IbatisNet SqlMap.config配置节导致的程序无法通过
    CAD数据分块,偏移校准,加载到百度地图、高德地图、谷歌等地图上
    数据库SQL优化大总结
    Scratch 3下载,最新版Scratch下载,macOS、Windows版
    高性能网站设计之缓存更新的套路
    【验证无效】MySQL的count(*)的优化,获取千万级数据表的总行数
  • 原文地址:https://www.cnblogs.com/qiu-hua/p/14902313.html
Copyright © 2020-2023  润新知