• python 机器学习库 —— featuretools(自动特征工程)


    文档:https://docs.featuretools.com/#minute-quick-start

    所谓自动特征工程,即是将人工特征工程的过程自动化。以 featuretools 为代表的自动特征工程在整个机器学习的端到端实践中扮演的角色如下图所示:


    这里写图片描述

    1. demo

    • 导入包:import featuretools as ft
    • 加载数据:data = ft.demo.load_mock_customer(),data 为 dict 类型
      • data.keys() ⇒ dict_keys([‘transactions’, ‘products’, ‘customers’, ‘sessions’])
        • 顾客发生了多次购买会话(session),每次会话产生了多次交易(transaction)
      • data[‘customers’] ⇒ DataFrame(Pandas)
    • 整理数据:

      customers_df = data['customers']
      sessions_df = data['sessions']
      transactions_df = data['transactions']
    • 构建数据集

      entities = {
         ...:    "customers" : (customers_df, "customer_id"),
         ...:    "sessions" : (sessions_df, "session_id", "session_start"),
         ...:    "transactions" : (transactions_df, "transaction_id", "transaction_time")
         ...: }
    • 指定关系:父实体与子实体的关系,通过如下四元组来定义:

      (parent_entity, parent_variable, child_entity, child_variable)

      接下来定义如下的关系:

      relationships = [("sessions", "session_id", "transactions", "session_id"),
                       ("customers", "customer_id", "sessions", "customer_id")]

    2. DFS:Deep Feature Synthesis,深度特征合成

    feature_matrix_customers, features_defs = ft.dfs(entities=entities, relationships=relationships, target_entity="customers"
  • 相关阅读:
    安装IIS
    SQL 通过某个字段名称找到数据库中对应的表
    javascript 操作 drop down list
    The project type is not supported by this installationVS2005
    Get 和 Post 简介
    .Net 控件调用 javascript事件
    JQuery检测浏览器版本
    开车要点
    linux shell工程师要求
    memory management
  • 原文地址:https://www.cnblogs.com/mtcnn/p/9421007.html
Copyright © 2020-2023  润新知