论文研读 2017/3/31
参考文献
[1.140] H. Ma, I. King, and M. R. Lyu, “Learning to recommend with social trust ensemble,” in Proc. 32nd Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2009, pp. 203–210
基于社会信任混合推荐
问题的引入
现代推荐算法普遍存在的问题:
- 用户-项目矩阵数据稀疏。很多商用推荐中的用户-项目矩阵数据密度不到1%。
- 一般假定各个用户独立同分布。即所有用户采取同一个模型进行商品评分。忽略了用户之间的社交关系,这与真实世界不符合。
- 冷启动(孙相国)
因此,单纯的挖掘用户-项目矩阵,并不能很好的进行推荐。为此,本文引入信任感知(trust-aware),将其与用户自己的tastes结合,即:一个用户的最终决定是他自己的tastes与其trusted friends' favors 的调和。在用户-项目矩阵(R)中,本文认为(R_{i,j})是用户(u_i)的tastes和其trusted friends tastes on the item(v_j)。
In terms of the users’ own tastes, we factorize the useritem matrix and learn two low-dimensional matrices, which.are user-specific latent matrix and item-specific latent matrix. For the social trust graph, based on the intuition that users always prefer the items recommended by the friends they trust, we infer and formulate the recommendation problem purely based on their trusted friends’ favors.
【问题:】论文已经假定R为用户tastes和其trusted friends’ favor的中和,那么为什么对R分解得到的却只是用户的tastes?得到的难道不应该是用户tastes和其trusted friends’ favor的中和吗?
改进:用户的tastes从用户的动作流中获得。
相关研究工作
本文主要梳理了两方面的推荐:1. 基于协同过滤的推荐;2. 基于社会信任(social trust-based)的推荐
1.基于协同过滤的推荐
(更详尽的信息,可以参考笔者的另一篇文档《推荐算法概述》)
传统的协同过滤主要聚焦于用户-项目矩阵,为此有三个分支:基于用户;基于商品;基于模型。基于用户和基于商品又统称为基于记忆/邻域的协同过滤。
基于记忆的协同过滤:通过相似用户的评分来预测待推荐用户对某一未知商品的预测评分(基于用户),或者通过待推荐用户使用过的其他相似商品的评分来预测特定未知商品的评分(基于商品),或者将两者结合起来。
基于模型:
aspect models [7, 8, 21], the latent factor model [4], the Bayesian hierarchical model [24] and the ranking model [11]. matrix factorization methods [16, 18, 19, 22]
These methods focus on factorizing the user-item rating matrix using lowrank representations, and then utilize them to make further predictions. The motivation behind a low-dimensional factorization model is that there is only a small number of factors that are important, and a user’s preference vector is determined by how each factor applies to that user.
2.基于社会信任的推荐
[1,2,13,14,15]
[1] developed a set of five natural axioms that a trust-based recommendation system might be expected to satisfy, and then proved that no system can simultaneously satisfy all the axioms.
[14, 15]studied the trust-aware recommender systems. Their work replaces the similarity finding process with the use of a trust metric, which is able to propagate trust over the trust network and to estimate a trust weight. The experiments on a large real dataset shows that this work increases the coverage (number of ratings that are predictable) while not reducing the accuracy (the error of predictions).
[2] proposed a trust-based recommender system for the Semantic Web; this system runs on a server with the knowledge distributed over the network in the form of ontologies, and uses the Web of trust to generate the recommendations.
[13]developed a factor analysis method based on the probabilistic graphical model which fuses the user-item matrix with the users’ social trust networks by sharing a common latent low-dimensional user feature matrix.
Rec with social trust
1.问题描述
In the real world, the process of recommendation scenario includes two central elements:
1. the trust network and the favors of these friendsin Fig. 1(a).
2. the useritem rating matrix in Fig. 1(b).
The problem we study in this paper is how to predict the missing values for the users effectively and efficiently by employing the trust graph and the user-item rating matrix.
2.用户特征描述
本文中的用户特征是从user-item矩阵分解后得到的。事实上,我们也可以借助第三方信息,通过自编码器来学习用户特征。矩阵分解做推荐的一般流程和本质,请见2017/3/21。
我们之前(2017/3/21)说过,矩阵分解做推荐的本质是寻找最好的(U,V)矩阵,使得下式最小化(上面的公式,在2017/3/21中我们讲过,此处不多做解释):
为了找到合适的(U,V),一种可行的方法是运用贝叶斯估计。即我们希望最大化:
得到(U,V)的极大似然估计。
运用贝叶斯定理可知:
本文假定(U,V)相互独立,则上式进一步化简为:
由此可见,我们需要找到三个概率分布:(mathcal{p}(R|U,V,overrightarrow{ heta_1}),mathcal{p}(U|overrightarrow{ heta_2}),mathcal{p}(V|overrightarrow{ heta_2})).这篇论文接下来就是按照这个思维进行论述的。
本文根据[1.140.19],令观测矩阵的条件分布为:
公式解读:
(mathcal{N}left(x|mu, sigma^2 ight)) is the probability density function of the Gaussian distribution note1 mean (mu) and variance (sigma^2) σ2, and (I_{ij} ^R) is the indicator function that is equal to 1 if user (u_i) rated item (v_j) and equal to 0 otherwise. The function (mathcal{g}left(x ight)) is the logistic functionnote2 (mathcal{g}(x) = 1/(1 + exp(−x))) which makes it possible to bound the range of (U_i^TV_j) within the range [0,1].
note1:为什么要用高斯分布呢?
我们注意到(Rapprox U^TV),一般的,我们认为(U^TV)是(R)的主要成分,对应的有(R_{ij}approx U_i^TV_j),即认为(U_i^TV_j)是(R_{ij})的主要成分。表达“主要成分”这一意涵,用高斯分布是合理的。
note2:logistic函数可以用于人工构造模拟概率和归一化。这里使用logistic函数目的是因为本文假定(R_{ij}in (0,1]),为了防止(U_i^TV_j)超过范围,需要用logistic函数进行归一化。
(U,V)的生成分布:
其中(sigma_U^2I)和(sigma_V^2I),表示(sigma_U^2)和(sigma_U^2)的第(i,j)个分量。
这样根据公式((6))我们有:
公式((9))对应的概率图模型见(Fig.2(a))
3.基于trusted friends的推荐
令 (mathcal{G} = (mathcal{U},mathcal{E}))表示社会信任拓扑,(mathcal{U})表示用户集合(m个用户),(mathcal{E})表示用户之间的信任关系。令 (S = {S_{ij}}) 为 (m imes m)的矩阵,叫作社会信任矩阵(social trust matrix). (S_{ij} in (0,1]) 表示用户(u_i)对用户(u_j)的信任程度。需要说明的是,矩阵(S)是非对称的,因为(u_i)信任用户(u_j)并不意味着用户(u_j)一定信任用户(u_i). 借助上一节的思想,
基于trusted friends的推荐,核心思想是,我们对受信好友的评分进行加权平均。从而得到目标用户的估计评分。具体来说,用户(i)的trusted friends用集合(mathcal{T}(i))表示,那么用户(i)对商品(k)的估计评分为:
事实上,(|mathcal{T}(i)|)对于(S_{i,:})来说是相同的,因此可以将其分配到(S)中,即令(S_{i,j}=S_{i,j}/|mathcal{T}(i)|,j in mathcal{T}(i)).这样,式((10))可以简化为:
由此得:
由于(hat{R})也是对(R)的近似(认为是主要成分),因此可以令(R)的观测概率分布为高斯分布.需要注意的是,公式((10))中的(R_{jk})我们用(U_i^TV_j)来替代,因为(i)的trusted friends并不总是都会对商品(k)打分。另外,我们的目标也是希望通过本小节来得到(U,V)矩阵。因此,(R)的观测概率分布为:
where (S_{ik}) is normalized by (|mathcal{T} (i)|), which is the number of trusted friends of user (u_i) in the set (mathcal{T} (i)). (I_{ij}^R) is the indicator function that is equal to 1 if user (i) rated item (j) and equal to 0 otherwise.
基于公式((14)),我们同样可以得到与上一节类似的贝叶斯估计:
假定用户信任网与(U,V)独立(这个假定的现实意义是用户彼此信任网络与user-movie的评分矩阵无关,这个假定是自然的),那么上式进一步简化为:
公式((16))对应的概率图模型见(Fig.2(b))
4.social trust ensemble
有了第2节和第3节的基础,这一小节,我们希望能够将2,3节的两个模型结合在一起,即“ensemble”.
对应的概率图模型为(Fig.2(c))
这里的参数有:(sigma,sigma_U,sigma_V)超参是(alpha).接下来,我们采用梯度下降法来进行参数估计:
5.复杂度分析
这里指的注意的是,在分析复杂度时,论文考虑到实际online social network具有幂律分布的特点,由此来估计复杂度更好,这个经验值得你借鉴。
6.改进
正如(Fig.2(c))展示的,你之前读过一篇论文是基于多个上下文(context),因此你可以把那个概率图模型与这个模型融合。
另外从用户的动作流中预测用户行为,相当于通过动作流得到用户的时间倾向序列,通过加入衰减函数,来得到用户-商品相似度。
论文研读2017/4/1
[7] Covington, Paul, Jay Adams, and Emre Sargin. "Deep neural networks for youtube recommendations." Proceedings of the 10th ACM Conference on Recommender Systems. ACM, 2016.
[8] Christakopoulou, Evangelia, and George Karypis. "Local item-item models for top-n recommendation." Proceedings of the 10th ACM Conference on Recommender Systems. ACM, 2016.
1.YouTube深度神经网络推荐
1.【孙相国】deep learning的问题:
SHORTCOMINGS
——刘铁岩,微软亚洲研究院
- 标注大量数据,代价高昂
- 训练深层模型,步履维艰
训练不稳定,超出GPU容量或时间开销较大
- 分布式计算,左右为难
技术不成熟,效率和效果矛盾
- 调参黑科技,难言之隐
超参数调节尚无自动化方法,依赖经验
- 黑箱算法,不明就里
缺乏可解释性和可修正性,应用受限(如医学、军事)
- 蛮力解法,舍本逐末
拟合数据表象,缺乏对数据产生(简答而美妙)的机理进行建模
- 动物智能,南辕北辙
尚未抓住人类与动物的本质区别,未对人类社会中知识传播、教育体系等进行建模
2. deeplearning for Rec专题
参考文献
[7] Covington, Paul, Jay Adams, and Emre Sargin. "Deep neural networks for youtube recommendations." Proceedings of the 10th ACM Conference on Recommender Systems. ACM, 2016.
[7.19] X. Su and T. M. Khoshgoftaar. A survey of collaborative filtering techniques. Advances in artificial intelligence, 2009:4, 2009.
[7.17] K. J. Oh, W. J. Lee, C. G. Lim, and H. J. Choi. Personalized news recommendation using classified keywords to capture user preference. In 16th International Conference on Advanced Communication Technology, pages 1283-1287, Feb 2014.
[7.8] W. Huang, Z. Wu, L. Chen, P. Mitra, and C. L. Giles.A neural probabilistic model for context based citation recommendation. In AAAI, pages 2404-2410, 2015
[7.20] D. Tang, B. Qin, T. Liu, and Y. Yang. User modeling with neural network for review rating prediction. In Proc. IJCAI, pages 1340-1346, 2015
[7.22] H. Wang, N. Wang, and D.-Y. Yeung. Collaborative deep learning for recommender systems. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’15, pages 1235-1244, New York, NY, USA, 2015. ACM.
[7.18] S. Sedhain, A. K. Menon, S. Sanner, and L. Xie. Autorec: Autoencoders meet collaborative filtering. In Proceedings of the 24th International Conference on World Wide Web, WWW ’15 Companion, pages 111-112, New York, NY, USA, 2015. ACM.
[7.5] A. M. Elkahky, Y. Song, and X. He. A multi-view deep learning approach for cross domain user modeling in recommendation systems. In Proceedings of the 24th International Conference on World Wide Web, WWW’15, pages 278-288, New York, NY, USA, 2015. ACM.
[7.21] A. van den Oord, S. Dieleman, and B. Schrauwen. Deep content-based music recommendation. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 26, pages 2643-2651. Curran Associates, Inc., 2013.
文献依赖关系
In contrast to vast amount of research in matrix factoriza-tion methods [19], there is relatively little work using deep neural networks for recommendation systems. Neural networks are used for recommending news in [17], citations in[8] and review ratings in [20]. Collaborative filtering is formulated as a deep neural network in [22] and auto-encoders in [18]. Elkahky et al. used deep learning for cross domain user modeling [5]. In a content-based setting, Burges et al. used deep neural networks for music recommendation [21].
3.系统框图
这其中包含两个神经网络:candidate generation和ranking
4. candidate generation
given用户(U)和环境(C),在(t)时刻观看video (i)的概率可以用一个softmax(多分类逻辑斯蒂)模型构建:
其中(U,V)分别是用户和视频。而(u,v)相当于矩阵分解中的用户隐因子和视频隐因子。它们是(mathbb{R}^N)的稠密向量(对应于原来的稀疏向量)。