• ALINK(二十八):特征工程(七)特征组合与交叉(二)Cross特征预测/训练 (CrossFeaturePredictBatchOp)


    Cross特征预测 (CrossFeaturePredictBatchOp)

    Java 类名:com.alibaba.alink.operator.batch.feature.CrossFeaturePredictBatchOp

    Python 类名:CrossFeaturePredictBatchOp

    功能介绍

    特征列组合算法能够将选定的离散列组合成单列的向量类型的数据。

    参数说明

    名称

    中文名称

    描述

    类型

    是否必须?

    默认值

    outputCol

    输出结果列列名

    输出结果列列名,必选

    String

     

    numThreads

    组件多线程线程个数

    组件多线程线程个数

    Integer

     

    1

    modelStreamFilePath

    模型流的文件路径

    模型流的文件路径

    String

     

    null

    modelStreamScanInterval

    扫描模型路径的时间间隔

    描模型路径的时间间隔,单位秒

    Integer

     

    10

    modelStreamStartTime

    模型流的起始时间

    模型流的起始时间。默认从当前时刻开始读。使用yyyy-mm-dd hh:mm:ss.fffffffff格式,详见Timestamp.valueOf(String s)

    String

     

    null

    代码示例

    Python 代码

    from pyalink.alink import *
    import pandas as pd
    useLocalEnv(1)
    df = pd.DataFrame([
    ["1.0", "1.0", 1.0, 1],
    ["1.0", "1.0", 0.0, 1],
    ["1.0", "0.0", 1.0, 1],
    ["1.0", "0.0", 1.0, 1],
    ["2.0", "3.0", None, 0],
    ["2.0", "3.0", 1.0, 0],
    ["0.0", "1.0", 2.0, 0],
    ["0.0", "1.0", 1.0, 0]])
    data = BatchOperator.fromDataframe(df, schemaStr="f0 string, f1 string, f2 double, label bigint")
    train = CrossFeatureTrainBatchOp().setSelectedCols(['f0','f1','f2']).linkFrom(data)
    CrossFeaturePredictBatchOp().setOutputCol("cross").linkFrom(train, data).collectToDataFrame()

    Java 代码

    import org.apache.flink.types.Row;
    import com.alibaba.alink.operator.batch.BatchOperator;
    import com.alibaba.alink.operator.batch.feature.CrossFeaturePredictBatchOp;
    import com.alibaba.alink.operator.batch.feature.CrossFeatureTrainBatchOp;
    import com.alibaba.alink.operator.batch.source.MemSourceBatchOp;
    import org.junit.Test;
    import java.util.Arrays;
    import java.util.List;
    public class CrossFeaturePredictBatchOpTest {
      @Test
      public void testCrossFeaturePredictBatchOp() throws Exception {
        List <Row> df = Arrays.asList(
          Row.of("1.0", "1.0", 1.0, 1),
          Row.of("1.0", "1.0", 0.0, 1),
          Row.of("1.0", "0.0", 1.0, 1),
          Row.of("1.0", "0.0", 1.0, 1),
          Row.of("2.0", "3.0", null, 0),
          Row.of("2.0", "3.0", 1.0, 0),
          Row.of("0.0", "1.0", 2.0, 0)
        );
        BatchOperator <?> data = new MemSourceBatchOp(df, "f0 string, f1 string, f2 double, label int");
        BatchOperator <?> train = new CrossFeatureTrainBatchOp().setSelectedCols("f0", "f1", "f2").linkFrom(data);
        new CrossFeaturePredictBatchOp().setOutputCol("cross").linkFrom(train, data).print();
      }
    }

    运行结果

    f0

    f1

    f2

    label

    cross

    1.0

    1.0

    1.0000

    1

    $36$0:1.0

    1.0

    1.0

    0.0000

    1

    $36$9:1.0

    1.0

    0.0

    1.0000

    1

    $36$6:1.0

    1.0

    0.0

    1.0000

    1

    $36$6:1.0

    2.0

    3.0

    null

    0

    $36$22:1.0

    2.0

    3.0

    1.0000

    0

    $36$4:1.0

    0.0

    1.0

    2.0000

    0

    $36$29:1.0

    0.0

    1.0

    1.0000

    0

    $36$2:1.0

    Cross特征训练 (CrossFeatureTrainBatchOp)

    Java 类名:com.alibaba.alink.operator.batch.feature.CrossFeatureTrainBatchOp

    Python 类名:CrossFeatureTrainBatchOp

    功能介绍

    特征列组合算法能够将选定的离散列组合成单列的向量类型的数据。

    参数说明

    名称

    中文名称

    描述

    类型

    是否必须?

    默认值

    selectedCols

    选择的列名

    计算列对应的列名列表

    String[]

     

    代码示例

    Python 代码

    from pyalink.alink import *
    import pandas as pd
    useLocalEnv(1)
    df = pd.DataFrame([
    ["1.0", "1.0", 1.0, 1],
    ["1.0", "1.0", 0.0, 1],
    ["1.0", "0.0", 1.0, 1],
    ["1.0", "0.0", 1.0, 1],
    ["2.0", "3.0", None, 0],
    ["2.0", "3.0", 1.0, 0],
    ["0.0", "1.0", 2.0, 0],
    ["0.0", "1.0", 1.0, 0]])
    data = BatchOperator.fromDataframe(df, schemaStr="f0 string, f1 string, f2 double, label bigint")
    train = CrossFeatureTrainBatchOp().setSelectedCols(['f0','f1','f2']).linkFrom(data)
    CrossFeaturePredictBatchOp().setOutputCol("cross").linkFrom(train, data).collectToDataFrame()

    Java 代码

    import org.apache.flink.types.Row;
    import com.alibaba.alink.operator.batch.BatchOperator;
    import com.alibaba.alink.operator.batch.feature.CrossFeaturePredictBatchOp;
    import com.alibaba.alink.operator.batch.feature.CrossFeatureTrainBatchOp;
    import com.alibaba.alink.operator.batch.source.MemSourceBatchOp;
    import org.junit.Test;
    import java.util.Arrays;
    import java.util.List;
    public class CrossFeatureTrainBatchOpTest {
      @Test
      public void testCrossFeatureTrainBatchOp() throws Exception {
        List <Row> df = Arrays.asList(
          Row.of("1.0", "1.0", 1.0, 1),
          Row.of("1.0", "1.0", 0.0, 1),
          Row.of("1.0", "0.0", 1.0, 1),
          Row.of("1.0", "0.0", 1.0, 1),
          Row.of("2.0", "3.0", null, 0),
          Row.of("2.0", "3.0", 1.0, 0),
          Row.of("0.0", "1.0", 2.0, 0)
        );
        BatchOperator <?> data = new MemSourceBatchOp(df, "f0 string, f1 string, f2 double, label int");
        BatchOperator <?> train = new CrossFeatureTrainBatchOp().setSelectedCols("f0", "f1", "f2").linkFrom(data);
        new CrossFeaturePredictBatchOp().setOutputCol("cross").linkFrom(train, data).print();
      }
    }

    运行结果

    f0

    f1

    f2

    label

    cross

    1.0

    1.0

    1.0000

    1

    $36$0:1.0

    1.0

    1.0

    0.0000

    1

    $36$9:1.0

    1.0

    0.0

    1.0000

    1

    $36$6:1.0

    1.0

    0.0

    1.0000

    1

    $36$6:1.0

    2.0

    3.0

    null

    0

    $36$22:1.0

    2.0

    3.0

    1.0000

    0

    $36$4:1.0

    0.0

    1.0

    2.0000

    0

    $36$29:1.0

    0.0

    1.0

    1.0000

    0

    $36$2:1.0

  • 相关阅读:
    eclipse编码格式设置教程、如何为eclipse设置编码格式?
    Eclipse中使用SVN
    个人mysql配置命令
    MySQL新建用户,授权,删除用户,修改密码等命令
    MySQL修改root密码的多种方法
    MySQL 5.6 for Windows 解压缩版配置安装
    在windows下安装mysql5.6.24版本
    CS231n assignment2 Q3 Dropout
    CS231n assignment2 Q1 Fully-connected Neural Network
    CS231n assignment2 Q2 Batch Normalization
  • 原文地址:https://www.cnblogs.com/qiu-hua/p/14901491.html
Copyright © 2020-2023  润新知