这是一篇《The Journal of Machine Learning Research》杂志上2003年的文章,题目如题。作者是Isabelle Guyon,Andr¶e Elissee。对这些人或者牛人,我并不熟悉。关键是引用数高达4455(google 学术数据,2013年5月31日)。不得不看了。
原文摘要如下:
Variable and feature selection have become the focus of much research in areas of application for which datasets with tens or hundreds of thousands of variables are available. These areas include text processing of internet documents, gene expression array analysis,
and combinatorial chemistry. (现实中很有用)
The objective of variable selection is three-fold: improving the prediction performance of the predictors, providing faster and more cost-effective predictors, and providing a better understanding of the underlying process that generated the data.
变量选择的目的有三个方面:提高 predictors的预测能力;提供快速的和高效的算法;能够提供一个对数据源的理解。
The contributions of this special issue cover a wide range of aspects of such problems: providing a better definition of the objective function, feature construction, feature ranking, multivariate feature selection, , efficient search methods, and feature validity assessment methods.
本文涵盖了如下问题:目标函数的定义,特征构建,特征排序(啥东西?),多维特征选取,高效的搜索方法,特征有效性评估方法。