http://scikit-learn.org/stable/modules/clustering.html#k-means
http://my.oschina.net/u/175377/blog/84420
K-Means clustering参数说明:
http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html#sklearn.cluster.KMeans
class sklearn.cluster.KMeans(n_clusters=8, init='k-means++', n_init=10, max_iter=300, tol=0.0001,precompute_distances='auto', verbose=0, random_state=None, copy_x=True, n_jobs=1)
n_clusters : int, optional, default: 8
The number of clusters to form as well as the number of centroids to generate.
集群形成的数量以及质心产生的数量。
max_iter : int, default: 300
Maximum number of iterations of the k-means algorithm for a single run.
k-means算法的一个单一运行的最大迭代数。
n_init : int, default: 10
Number of time the k-means algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs in terms of inertia.
不同质心的种子的k - means算法将运行的次数。最终结果将是n_init次连续运行的最好的输出。
init : {‘k-means++’, ‘random’ or an ndarray}
Method for initialization, defaults to ‘k-means++’:
初始化的方法,默认为“k - means + +”:
‘k-means++’ : selects initial cluster centers for k-mean clustering in a smart way to speed up convergence. See section Notes in k_init for more details.“k - means + +”:用优化的方式来加速收敛,以选择k-mean初始集群中心。
‘random’: choose k observations (rows) at random from data for the initial centroids.
‘random’:从数据中随机的选择k个观测值作为初始的聚类中心。
If an ndarray is passed, it should be of shape (n_clusters, n_features) and gives the initial centers.
如果一个n胃数组传递,它的形状应该是(n_clusters n_features),并给出初始中心。
precompute_distances : {‘auto’, True, False}
Precompute distances (faster but takes more memory).
预计算的距离(更快,但需要更多的内存)。
‘auto’ : do not precompute distances if n_samples * n_clusters > 12 million. This corresponds to about 100MB overhead per job using double precision.
‘auto’:当n_samples * n_clusters > 1200万时,不要预先计算距离。这对应于使用双精度数据会带来平均大约100 mb的开销。
True : always precompute distances
False : never precompute distances
tol : float, default: 1e-4
Relative tolerance with regards to inertia to declare convergence
对于精度的惯性收敛
n_jobs : int
The number of jobs to use for the computation. This works by computing each of the n_init runs in parallel.用于计算的工作量。这是通过计算每个n_init并行运行。
If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debugging. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used.
random_state : integer or numpy.RandomState, optional
The generator used to initialize the centers. If an integer is given, it fixes the seed. Defaults to the global numpy random number generator.
verbose : int, default 0
Verbosity mode.
copy_x : boolean, default True
|
|
cluster_centers_ : array, [n_clusters, n_features]
labels_ : :
inertia_ : float
|
|
---|