• 机器学习论文笔记(2)


    Sparse Subspace Clustering:Algorithm, Theory, and Applications.

    Abstract:Many real-world problems deal with collections of high-dimensional data, such as images, videos, text, and web documents, DNA microarray data, and more. Often, such high-dimensional data lie close to low-dimensional structures corresponding to several classes or categories to which the data belong. In this paper, we propose and study an algorithm, called sparse subspace clustering, to cluster data points that lie in a union of low-dimensional subspaces. The key idea is that, among the infinitely many possible representations of a data point in terms of other points, a sparse representation corresponds to selecting a few points from the same subspace. This motivates solving a sparse optimization program whose solution is used in a spectral clustering framework to infer the clustering of the data into subspaces. Since solving the sparse optimization program is in general NP-hard, we consider a convex relaxation and show that, under appropriate conditions on the arrangement of the subspaces and the distribution of the data, the proposed minimization program succeeds in recovering the desired sparse representations. The proposed algorithm is efficient and can handle data points near the intersections of subspaces. Another key advantage of the proposed algorithm with respect to the state of the art is that it can deal directly with data nuisances, such as noise, sparse outlying entries, and missing entries, by incorporating the model of the data into the sparse optimization program. We demonstrate the effectiveness of the proposed algorithm through experiments on synthetic data as well as the two real-world problems of motion segmentation and face clustering.

     
    许多现实世界中的问题都涉及到高维数据的收集,例如图像,视频,文本和Web文档,DNA微阵列数据等等。通常,此类高维数据位于与数据所属的几个类别或类别相对应的低维结构附近。在本文中,我们提出并研究了一种称为稀疏子空间聚类的算法,以对位于低维子空间联合中的数据点进行聚类。关键思想是,在用其他点表示的数据点的无限多个可能表示中,稀疏表示对应于从同一子空间中选择几个点。这激励了求解稀疏优化程序,该程序的解决方案用于频谱聚类框架中,以将数据聚类到子空间中。由于求解稀疏优化程序通常是NP难的,因此我们考虑了凸松弛,并表明在适当的子空间排列和数据分布条件下,所提出的最小化程序成功地恢复了所需的稀疏表示。所提出的算法是有效的并且可以处理子空间的相交附近的数据点。相对于现有技术,所提出的算法的另一个关键优势在于,它可以通过将数据模型合并到稀疏优化程序中来直接处理数据滋扰,例如噪声,稀疏的外围条目和丢失的条目。我们通过对合成数据进行实验以及运动分割人脸聚类的两个实际问题证明了所提算法的有效性。
     
     
  • 相关阅读:
    对于HTTP过程中POST内容加密的解决方案
    电脑重启后IDEA项目中import class 报错
    前端Button点击无反应--记一次Debug经历
    plsql 中substr函数和instr函数的灵活应用
    浅谈Plsql 中inner join 和left join的使用
    Mysql查询优化
    Mysql单机安裝
    centos查看文件被哪个进程占用
    jvm垃圾收集器详解
    navicat破解
  • 原文地址:https://www.cnblogs.com/ZhangWj-/p/12922398.html
Copyright © 2020-2023  润新知