Paper Reading: A Brief Introduction to Weakly Supervised Learning

Paper Reading: A Brief Introduction to Weakly Supervised Learning
incomplete, 想利用未标注数据帮助训练

inexact, 笼统的数据标注，如垃圾邮件分类

inaccurate supervision，带噪声的数据，如众包

Incomplete supervision

training data set (D={(x_1,y_1),cdots,(x_l,y_l),x_{l+1},cdots,x_m})

active learning (with human intervention)

the labeling cost only depends on the number of queries
1. informativeness: an unlabeled instance helps reduce the uncertainty of a statistical model.
  
  1.1 Uncertainty sampling a single learner, with the least confidence
  
  1.2 query-by-committee multiple learners, disagree to most
2. representativeness : an instance helps represent the structure of input patterns
  
  2.1 aim to exploit the cluster structure of unlabeled data
semi-supervised learning (no human intervention is assumed)

Here, although the unlabeled data points are not explicitly with label information, they implicitly convey some information about data distribution which can be helpful for predictive modelling.

two basic assumptions: the cluster assumption (data have inherent cluster structure) and the manifold assumption (data lie on a manifold).
1. generative methods
  
  labels of unlabeled instances can be treated as missing values of model parameters, and estimated by approaches such as the EM .
  
  To get good performance, one usually needs domain knowledge to determine adequate generative model.
2. graph based methods
  
  the performance will heavily depends on how the graphis constructed.
3. low-density seperation methods
  
  It is evident that S3VMs try to identify a classiﬁcation boundary which goes across the less dense region while keeping the labeled data correctly classiﬁed.
4. disagreement-based methods
  
  generate multiple learners and let them collaborate to exploit unlabeled data.
Inexact Supervision

Multi-instance learning: predict the labels for unseen bags((X_i) is a positive bag, if there exists (x_{ip}) which is positive, while p is unknown).

Inaccurate Supervision

For machine learning, crowdsourcing is commonly used as a cost-saving way to collect labels for training data.
相关阅读:
开源协议的比较(BSD,Apache,GPL,LGPL,MIT)
免费的DNS服务器
 网站运行时间代码
 Linux实用命令大合集（长期更新）
Centos7基本设置
 java基础-异常
 Java基础-String
00033-layui 自定义字典模块及工具方法
 00032-layui 树形下拉选择 xmSelect（二）：数据懒加载
 00031-layui 树形下拉选择 xmSelect（一）：树数据一次加载
原文地址：https://www.cnblogs.com/blueprintf/p/8533759.html

Paper Reading: A Brief Introduction to Weakly Supervised Learning

Incomplete supervision

Inexact Supervision

Inaccurate Supervision