Another challenging trend in Internet evolution is the tremendous growth of the infrastructure in every dimension, including bandwidth capacity of links(背景). Most real-world applications of traffic classification require tools to work online, reporting live information or triggering action according to classification results(目的). But online traffic classification on modern links requires trade-offs(局限) among accuracy, performance, and cost. The practical challenges have led to many published studies with limited evaluation in a simplified environment(当前只是简单地弱化了应用场景) rather than a systematic rigorous analysis of these trade-offs. For example, in order to work online without custom (often prohibitively expensive) hardware(额外的硬件支持是个敏感的话题), complex DPI classifiers must sacrifice functionality — either analyzing a shorter portion of the payload stream of each traffic flow, or simplifying their pattern matching approaches.
Machine learning techniques require similar compromises(ML方法同样需要调整策略以适应online) to lower or bound the latency of classification during online execution. Data reduction is generally implemented by limiting the number of packets of a flow [9, 10](方法1:减少数据包数) used for extracting classification features. Computational overhead is limited by reducing the set of features [11] used to classify traffic, ideally using features that can be extracted with low computational complexity(方法2:降低特征提取的复杂度). Some features are not suitable for online classification because they are available only at the end of a flow(方法3:不再使用流的终止特征), such as total transferred bytes.
Limiting the number of packets used to extract features offers several benefits: lower feature extraction complexity; lower latency since classification can occur early in each traffic flow; and lower memory cost to maintain flow state during classification.
Dainotti A, Pescape A, Claffy K C. Issues and future directions in traffic classification[J]. IEEE network. 2012, 26(1).