• PP: Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data


    From: Stanford University; Jure Leskovec, citation 6w+; 

    Problem:

    subsequence clustering.

    Challenging:

    discover patterns is challenging because it requires simultaneous segmentation and clustering of the time series + interpreting the cluster results is difficult. 

    Why discover time series patterns is a challenge?? thinking by yourself!! there are already so many distance measures(DTW, manifold distance) and clustering methods(knn,k-means etc.). But I admit the interpretation is difficult.

    Introduction:

    long time series ----breakdown-----> a sequence of states/patterns ------> so time series can be expressed as a sequential timeline of a few key states. -------> discover repeated patterns/ understand trends/ detect anomalies/ better interpret large and high-dimensional datasets. 

    Key steps: simultaneously segment and cluster the time series.

    Unsupervised learning: hard to interpretation, after clustering, you have to view data itself.

    how to discover interpretable structure in the data?

    Traditional clustering methods are not particularly well-suited to discover interpretable structure in the data. This is because they typically rely on distance-based metrics

    distance-based metrics, DTW.

    距离式的算法,在处理multivariate time series上有劣势,看不到细微的数据结构相似性

    Propose a new method for multivariate time series clustering TICC:

    • define each cluster as a dependency network showing the relationships between the different sensors in a short subsequence.
    • each cluster is a markov random field. 
    • In thes MRFs, an edge represents a partial correlation between two variables. 
    • learn each cluster's MRF by estimating a sparse Gaussian inverse covariance matrix. 
    • This network has multiple layers. 
    • the number of layers corresponds to the window size of a short subsequence.
    • 逆协方差矩阵定义了MRF dependency network 的adjaccency matrix.

    Related work: 

    time series clustering and convex optimization;

    variations of dtw; symbolic representations; rule-based motif discovery; 

    However, these methods generally rely on distance-based metrics. 

    TICC ------ a model-based clustering method, like ARMA, Gaussian mixture or hidden markov models. 

    • define each cluster by a Gaussian inverse covariance. 
    • so the Gaussian inverse covariance defines a Markov random field encoding the structural representation.  
    • K clusters/ inverse covariances.

    selecting the number of clusters: cross-validation; mornalized mutual information; BIC or silhouette score. 

     看不懂哇 T T 

    Supplementary knowledge:

    1. 对于unsupervised learning, 目前对结果的解释或者中间参数的选取,全是靠经验

    2. Aarhus data, Martin, 做多变量time series 预测。

    3. Toeplitz Matrices: 常对角矩阵。 

    4. ticc code

    Reference: 

    1. 如何用简单易懂的例子解释条件随机场(CRF)模型?

  • 相关阅读:
    c++ cout、cin、endl
    c++ namespace
    找到小镇的法官
    整数反转
    c++stack类的用法
    栈应用:最小栈(第二题)
    栈的压入、弹出序列(第一题)
    c++中vector类的用法
    Android 面试常问七道题
    传感器实现仿微信摇一摇功能
  • 原文地址:https://www.cnblogs.com/dulun/p/12244506.html
Copyright © 2020-2023  润新知