• ALINK(四十一):模型评估(六)聚类评估 (EvalClusterBatchOp)


    Java 类名:com.alibaba.alink.operator.batch.evaluation.EvalClusterBatchOp

    Python 类名:EvalClusterBatchOp

    功能介绍

    聚类评估是对聚类算法的预测结果进行效果评估,支持下列评估指标。

     

     

    参数说明

    名称

    中文名称

    描述

    类型

    是否必须?

    默认值

    predictionCol

    预测结果列名

    预测结果列名

    String

     

    labelCol

    标签列名

    输入表中的标签列名

    String

     

    null

    vectorCol

    向量列名

    输入表中的向量列名

    String

     

    null

    distanceType

    距离度量方式

    距离类型

    String

     

    "EUCLIDEAN"

    代码示例

    Python 代码

    from pyalink.alink import *
    import pandas as pd
    useLocalEnv(1)
    df = pd.DataFrame([
        [0, "0 0 0"],
        [0, "0.1,0.1,0.1"],
        [0, "0.2,0.2,0.2"],
        [1, "9 9 9"],
        [1, "9.1 9.1 9.1"],
        [1, "9.2 9.2 9.2"]
    ])
    inOp = BatchOperator.fromDataframe(df, schemaStr='id int, vec string')
    metrics = EvalClusterBatchOp().setVectorCol("vec").setPredictionCol("id").linkFrom(inOp).collectMetrics()
    print("Total Samples Number:", metrics.getCount())
    print("Cluster Number:", metrics.getK())
    print("Cluster Array:", metrics.getClusterArray())
    print("Cluster Count Array:", metrics.getCountArray())
    print("CP:", metrics.getCp())
    print("DB:", metrics.getDb())
    print("SP:", metrics.getSp())
    print("SSB:", metrics.getSsb())
    print("SSW:", metrics.getSsw())
    print("CH:", metrics.getVrc())

    Java 代码

    import org.apache.flink.types.Row;
    import com.alibaba.alink.operator.batch.BatchOperator;
    import com.alibaba.alink.operator.batch.evaluation.EvalClusterBatchOp;
    import com.alibaba.alink.operator.batch.source.MemSourceBatchOp;
    import com.alibaba.alink.operator.common.evaluation.ClusterMetrics;
    import org.junit.Test;
    import java.util.Arrays;
    import java.util.List;
    public class EvalClusterBatchOpTest {
      @Test
      public void testEvalClusterBatchOp() throws Exception {
        List <Row> df = Arrays.asList(
          Row.of(0, "0 0 0"),
          Row.of(0, "0.1,0.1,0.1"),
          Row.of(0, "0.2,0.2,0.2"),
          Row.of(1, "9 9 9"),
          Row.of(1, "9.1 9.1 9.1"),
          Row.of(1, "9.2 9.2 9.2")
        );
        BatchOperator <?> inOp = new MemSourceBatchOp(df, "id int, vec string");
        ClusterMetrics metrics = new EvalClusterBatchOp().setVectorCol("vec").setPredictionCol("id").linkFrom(inOp)
          .collectMetrics();
        System.out.println("Total Samples Number:" + metrics.getCount());
        System.out.println("Cluster Number:" + metrics.getK());
        System.out.println("Cluster Array:" + Arrays.toString(metrics.getClusterArray()));
        System.out.println("Cluster Count Array:" + Arrays.toString(metrics.getCountArray()));
        System.out.println("CP:" + metrics.getCp());
        System.out.println("DB:" + metrics.getDb());
        System.out.println("SP:" + metrics.getSp());
        System.out.println("SSB:" + metrics.getSsb());
        System.out.println("SSW:" + metrics.getSsw());
        System.out.println("CH:" + metrics.getVrc());
      }
    }

    运行结果

    Total Samples Number: 6
    Cluster Number: 2
    Cluster Array: ['0', '1']
    Cluster Count Array: [3.0, 3.0]
    CP: 0.11547005383792497
    DB: 0.014814814814814791
    SP: 15.588457268119896
    SSB: 364.5
    SSW: 0.1199999999999996
    CH: 12150.000000000042
  • 相关阅读:
    alter noparallel
    朝代
    asp.net core 发布包含文件
    执行dotnet *.dll启动项目,如何修改环境变量----ASPNETCORE_ENVIRONMENT
    MySQL授权--WITH GRANT OPTION
    js/ts/tsx读取excel表格中的日期格式转换
    linux test tool--"ab"
    nginx代理配置
    docker 容器与本机文件的拷贝操作
    linux系统,没有安装任何编辑器的情况,如何操作文件
  • 原文地址:https://www.cnblogs.com/qiu-hua/p/14902458.html
Copyright © 2020-2023  润新知