• Similarity metrics(Updated Aug,8th)


    Here is a link that explains the cosine similarity and cosine pairwise distances.

    https://stackoverflow.com/questions/35281691/scikit-cosine-similarity-vs-pairwise-distances

    So the codes in the first tutorial may be wrong.It misuse distances and similarities.

    https://cambridgespark.com/content/tutorials/implementing-your-own-recommender-systems-in-Python/index.html

    This is some simple tests:

    import numpy as np
    from sklearn.metrics.pairwise import pairwise_distances
    from math import *
    from sklearn.metrics.pairwise import cosine_similarity
    #construct a matrix
    mat = np.zeros((5,10))
    mat = np.matrix(
        [[2, 3, 0, 0, 0, 0, 5, 0, 1, 0],
         [20,30,0, 0, 0, 0, 50,0, 10,0],
         [1, 7, 0, 0, 0, 0, 2, 0, 8, 0],
         [2, 3, 0, 0, 0, 0, 0, 0, 1, 0],
         [4, 6, 0, 0, 7, 0, 0, 0, 2, 0]])
    #row is user, col is venue, intersections is checkin frequencies
    user_dis = pairwise_distances(mat,metric='cosine')
    user_sim = cosine_similarity(mat)
    user_dis
    Out[3]: 
    array([[ 0.        ,  0.        ,  0.39561935,  0.40085531,  0.56244658],
           [ 0.        ,  0.        ,  0.39561935,  0.40085531,  0.56244658],
           [ 0.39561935,  0.39561935,  0.        ,  0.23729486,  0.44299892],
           [ 0.40085531,  0.40085531,  0.23729486,  0.        ,  0.26970326],
           [ 0.56244658,  0.56244658,  0.44299892,  0.26970326,  0.        ]])
    user_sim
    Out[4]: 
    array([[ 1.        ,  1.        ,  0.60438065,  0.59914469,  0.43755342],
           [ 1.        ,  1.        ,  0.60438065,  0.59914469,  0.43755342],
           [ 0.60438065,  0.60438065,  1.        ,  0.76270514,  0.55700108],
           [ 0.59914469,  0.59914469,  0.76270514,  1.        ,  0.73029674],
           [ 0.43755342,  0.43755342,  0.55700108,  0.73029674,  1.        ]])

    We can see that the most similar(the same) items' cosine distance is 0 and their similarity is 1.

    To be more clear we will use cosine_similaity function in the future.

    And from the artificial matrix, we can see that cosine_similarity deals well with some kinds of situations, like usr[0] and usr[1], they two have a very similar taste, except that the frequency of usr[1] is 10 times of usr[0]. And cosine similarity thinks their similarity is one! This is consistent with human recognition.

    As for other comparisons of usr[0] and other users similarity:

    usr[2]≈usr[3]>usr[4]

    usr[2] goes to all the places usr[0] has gone to, the only difference is that they have different frequencies, usr[3] left out on place[6] but usr[3]'s visiting frequency is actually the same as usr[0].

    I think it is quite reasonable to get such a result, so using cosine_similarity may reflect the relationship between users very well.

  • 相关阅读:
    蜂窝网格的坐标以及寻路
    unity3d 第三人称视角的人物移动以及相机控制
    基本HTML结构
    平衡二叉树
    STL基础复习
    递归
    unity 傅老师学习
    blender基础操作
    最小生成树
    最短路径
  • 原文地址:https://www.cnblogs.com/fassy/p/7307131.html
Copyright © 2020-2023  润新知