Unsupervised Learning: Introduction(非监督学习概述)
监督学习与非监督学习比较:监督学习需要输入训练数据的同时,需要输入带有标签的事件结果,而非监督学习只需要提供数据训练数据,算法自动发现数据的内部结构。集群是非监督学习算法的一种。
In unsupervised learning, the training set is of the form {x^{(1)},x^{(2)},dots,x^{(m)}}{x(1),x(2),…,x(m)} without labels y^{(i)}y(i).;
In unsupervised learning, you are given an unlabeled dataset and are asked to find "structure" in the data.
Clustering is an example of unsupervised learning.
clustering(集群)
In the clustering problem we are given an unlabeled data set and we would like to have an algorithm automatically group the data into coherent subsets or into coherent clusters for us.K Means algorithm(K均值算法)
input:K(number of clusters) & Training set(x(1),...x(m))
K-‐means algorithm(process)
问题1:初始化的簇中心点没有分配到记录点
处理办法:移除该簇中心点或者重新初始化一个簇中心点
问题2:K-‐means for non-‐separated clusters
处理办法:按照实际应用就近分配
Optimization objective(优化目标)
Random initialization(随机初始化)
过程:设定K值,随机选择K(K<m)个数据点,设定这些数据点为簇中心点
问题:Local optima(局部最优解)
解决办法:多次循环,选择最小值
Choosing the number of clusters(K值选择)
随曲线优化过程和实际应用调整进行选择