• ALINK(四十一):模型评估(六)聚类评估 (EvalClusterBatchOp)


    Java 类名:com.alibaba.alink.operator.batch.evaluation.EvalClusterBatchOp

    Python 类名:EvalClusterBatchOp

    功能介绍

    聚类评估是对聚类算法的预测结果进行效果评估,支持下列评估指标。

     

     

    参数说明

    名称

    中文名称

    描述

    类型

    是否必须?

    默认值

    predictionCol

    预测结果列名

    预测结果列名

    String

     

    labelCol

    标签列名

    输入表中的标签列名

    String

     

    null

    vectorCol

    向量列名

    输入表中的向量列名

    String

     

    null

    distanceType

    距离度量方式

    距离类型

    String

     

    "EUCLIDEAN"

    代码示例

    Python 代码

    from pyalink.alink import *
    import pandas as pd
    useLocalEnv(1)
    df = pd.DataFrame([
        [0, "0 0 0"],
        [0, "0.1,0.1,0.1"],
        [0, "0.2,0.2,0.2"],
        [1, "9 9 9"],
        [1, "9.1 9.1 9.1"],
        [1, "9.2 9.2 9.2"]
    ])
    inOp = BatchOperator.fromDataframe(df, schemaStr='id int, vec string')
    metrics = EvalClusterBatchOp().setVectorCol("vec").setPredictionCol("id").linkFrom(inOp).collectMetrics()
    print("Total Samples Number:", metrics.getCount())
    print("Cluster Number:", metrics.getK())
    print("Cluster Array:", metrics.getClusterArray())
    print("Cluster Count Array:", metrics.getCountArray())
    print("CP:", metrics.getCp())
    print("DB:", metrics.getDb())
    print("SP:", metrics.getSp())
    print("SSB:", metrics.getSsb())
    print("SSW:", metrics.getSsw())
    print("CH:", metrics.getVrc())

    Java 代码

    import org.apache.flink.types.Row;
    import com.alibaba.alink.operator.batch.BatchOperator;
    import com.alibaba.alink.operator.batch.evaluation.EvalClusterBatchOp;
    import com.alibaba.alink.operator.batch.source.MemSourceBatchOp;
    import com.alibaba.alink.operator.common.evaluation.ClusterMetrics;
    import org.junit.Test;
    import java.util.Arrays;
    import java.util.List;
    public class EvalClusterBatchOpTest {
      @Test
      public void testEvalClusterBatchOp() throws Exception {
        List <Row> df = Arrays.asList(
          Row.of(0, "0 0 0"),
          Row.of(0, "0.1,0.1,0.1"),
          Row.of(0, "0.2,0.2,0.2"),
          Row.of(1, "9 9 9"),
          Row.of(1, "9.1 9.1 9.1"),
          Row.of(1, "9.2 9.2 9.2")
        );
        BatchOperator <?> inOp = new MemSourceBatchOp(df, "id int, vec string");
        ClusterMetrics metrics = new EvalClusterBatchOp().setVectorCol("vec").setPredictionCol("id").linkFrom(inOp)
          .collectMetrics();
        System.out.println("Total Samples Number:" + metrics.getCount());
        System.out.println("Cluster Number:" + metrics.getK());
        System.out.println("Cluster Array:" + Arrays.toString(metrics.getClusterArray()));
        System.out.println("Cluster Count Array:" + Arrays.toString(metrics.getCountArray()));
        System.out.println("CP:" + metrics.getCp());
        System.out.println("DB:" + metrics.getDb());
        System.out.println("SP:" + metrics.getSp());
        System.out.println("SSB:" + metrics.getSsb());
        System.out.println("SSW:" + metrics.getSsw());
        System.out.println("CH:" + metrics.getVrc());
      }
    }

    运行结果

    Total Samples Number: 6
    Cluster Number: 2
    Cluster Array: ['0', '1']
    Cluster Count Array: [3.0, 3.0]
    CP: 0.11547005383792497
    DB: 0.014814814814814791
    SP: 15.588457268119896
    SSB: 364.5
    SSW: 0.1199999999999996
    CH: 12150.000000000042
  • 相关阅读:
    居然就这么没有了
    RAID4 in WAFL
    网络存储导论第15章:Netapp产品分析
    radwareAPSolute应用前端解决方案全局负载均衡解决方案
    RAID , LVM and EVMS
    FND_STANDARD.SET_WHO
    基于基表的Form开发
    eclipse pydev 升级地址
    .net程序员应该知道的
    收集利用Jquery取得iframe中元素
  • 原文地址:https://www.cnblogs.com/qiu-hua/p/14902458.html
Copyright © 2020-2023  润新知