• 数据挖掘导论-1


    Classification [Predictive]
    Clustering  [Descriptive]
    Association Rule Discovery [Descriptive]
    Sequential Pattern Discovery [Descriptive]
    Regression [Predictive]
    Deviation Detection [Predictive]

    categorical/qualitative
    1) nominal:
    mode众数
    entropy熵
    contingency correlation列联相关
    x,2-test卡方检验

    2) Ordinal: median/percentiles/rank correlation/
    run tests游程检验
    sign test符号检验

    numeric/quantitative

    3) Interval:
    mean/standard deviation/Pearson's correlation/t and F tests
    4) Ratio:
    geometric mean/harmonic mean/percent variation百分比变差


     data quality problems:

    1) Noise and outliers
    2) missing values
    why: 1. info not collected; 2. attributes not applicable for all
    how: 1. eliminate data objects; 2. estimate missing values; 3. Ignore missing values during analysis; 4. replace with all possible values(weighted by probabilities)
    3) duplicate data


    data preprocessing:
    1) aggregation
    2) sampling
    3) dimensionality reduction
    curse of dimensionality: dimensionality↑sparse↑,density & distance meaningful↓
    how: Principle Component Analysis; Singular Value Decomposition
    4) feature subset selection

    5) feature creation

    feature extraction: domain-specific
    mapping data to new space: Fourier transform/Wavelet transform
    feature construction: combining features

    6) discretization and binarization
    7) attribute transformation





    Euclidean density = number of points per unit volume

  • 相关阅读:
    css定位
    css遗漏
    php字符操作
    php类于对象
    php数组的操作
    php基础
    javascript显式类型的转换
    【模板】并查集
    图论三种做法:朴素版Dijkstra、堆优化(优先队列)Dijkstra、spfa(队列优化版Bellman-Ford)
    二分之一网打尽
  • 原文地址:https://www.cnblogs.com/pxy7896/p/6493064.html
Copyright © 2020-2023  润新知