• Paper Reading: A Brief Introduction to Weakly Supervised Learning


    incomplete, 想利用未标注数据帮助训练

    inexact, 笼统的数据标注,如垃圾邮件分类

    inaccurate supervision, 带噪声的数据,如众包

    Incomplete supervision

    training data set (D={(x_1,y_1),cdots,(x_l,y_l),x_{l+1},cdots,x_m})

    active learning (with human intervention)

    ​ the labeling cost only depends on the number of queries

    1. informativeness: an unlabeled instance helps reduce the uncertainty of a statistical model.

      1.1 Uncertainty sampling a single learner, with the least confidence

      1.2 query-by-committee multiple learners, disagree to most

    2. representativeness : an instance helps represent the structure of input patterns

      2.1 aim to exploit the cluster structure of unlabeled data

    semi-supervised learning (no human intervention is assumed)

    ​ Here, although the unlabeled data points are not explicitly with label information, they implicitly convey some information about data distribution which can be helpful for predictive modelling.

    ​ two basic assumptions: the cluster assumption (data have inherent cluster structure) and the manifold assumption (data lie on a manifold).

    1. generative methods

      ​ labels of unlabeled instances can be treated as missing values of model parameters, and estimated by approaches such as the EM .

      ​ To get good performance, one usually needs domain knowledge to determine adequate generative model.

    2. graph based methods

      ​ the performance will heavily depends on how the graphis constructed.

    3. low-density seperation methods

      ​ It is evident that S3VMs try to identify a classification boundary which goes across the less dense region while keeping the labeled data correctly classified.

    4. disagreement-based methods

      ​ generate multiple learners and let them collaborate to exploit unlabeled data.

    Inexact Supervision

    ​ Multi-instance learning: predict the labels for unseen bags((X_i) is a positive bag, if there exists (x_{ip}) which is positive, while p is unknown).

    Inaccurate Supervision

    ​ For machine learning, crowdsourcing is commonly used as a cost-saving way to collect labels for training data.

  • 相关阅读:
    开源协议的比较(BSD,Apache,GPL,LGPL,MIT)
    免费的DNS服务器
    网站运行时间代码
    Linux实用命令大合集(长期更新)
    Centos7基本设置
    java基础-异常
    Java基础-String
    00033-layui 自定义 字典模块 及 工具方法
    00032-layui 树形下拉选择 xmSelect(二):数据懒加载
    00031-layui 树形下拉选择 xmSelect(一):树数据一次加载
  • 原文地址:https://www.cnblogs.com/blueprintf/p/8533759.html
Copyright © 2020-2023  润新知