• spark MLlib 概念 5: 余弦相似度(Cosine similarity)


    概述:
    余弦相似度 是对两个向量相似度的描述,表现为两个向量的夹角的余弦值。当方向相同时(调度为0),余弦值为1,标识强相关;当相互垂直时(在线性代数里,两个维度垂直意味着他们相互独立),余弦值为0,标识他们无关。
    Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. The cosine of 0° is 1, and it is less than 1 for any other angle. It is thus a judgement of orientation and not magnitude: two vectors with the same orientation have a Cosine similarity of 1, two vectors at 90° have a similarity of 0, and two vectors diametrically opposed have a similarity of -1, independent of their magnitude. Cosine similarity is particularly used in positive space, where the outcome is neatly bounded in [0,1].

    定义
    基础知识。。

    The cosine of two vectors can be derived by using the Euclidean dot product formula:

    mathbf{a}cdotmathbf{b}
=left|mathbf{a}
ight|left|mathbf{b}
ight|cos	heta

    Given two vectors of attributes, A and B, the cosine similarity, cos(θ), is represented using a dot product and magnitude as

    	ext{similarity} = cos(	heta) = {A cdot B over |A| |B|} = frac{ sumlimits_{i=1}^{n}{A_i 	imes B_i} }{ sqrt{sumlimits_{i=1}^{n}{(A_i)^2}} 	imes sqrt{sumlimits_{i=1}^{n}{(B_i)^2}} }

    The resulting similarity ranges from −1 meaning exactly opposite, to 1 meaning exactly the same, with 0 usually indicating independence, and in-between values indicating intermediate similarity or dissimilarity.

    与皮尔森相关系数的关系
    If the attribute vectors are normalized by subtracting the vector means (e.g., A - ar{A}), the measure is called centered cosine similarity and is equivalent to the Pearson Correlation Coefficient.










     



  • 相关阅读:
    Chisel3
    Chisel3
    Chisel3
    Chisel3
    Chisel3
    Chisel3
    Chisel3
    Chisel3
    Chisel3
    UVa 12716 (GCD == XOR) GCD XOR
  • 原文地址:https://www.cnblogs.com/zwCHAN/p/4265882.html
Copyright © 2020-2023  润新知