Samplingbiascorrected neural modeling for large corpus item recommendations

Samplingbiascorrected neural modeling for large corpus item recommendations
目录
Yi Y., Yang J., Hong L., Cheng D. Z., Heldt L., Kumthekar A., Zhao Z., Wei L. and Chi E. Sampling-bias-corrected neural modeling for large corpus item recommendations. In ACM Conference on Recommender Systems (RecSys), 2019.

概

在推荐系统中, 我们常常需要采样负样本, 一种比较普遍的方式是采用 in-batch 的方式来进行, 但是这种方法存在偏置, 作者将 here 的方法进行一个拓展.

符号说明
- \(\{x_i\}_{i=1}^N\), query 属性;
- \(\{y_j\}_{j=1}^M\), item 属性;
- 通过 \(u(\cdot), v(\cdot)\) 将二者映射到一个 \(k\) 维的空间上:
  \[u(x), v(x) \in \mathbb{R}^k, \]
  如下图所示:
- 两者间的分数由
  
  \[s(x, y) = \langle u(x), v(x) \rangle \]
  表示;
- \(\mathcal{T}:= \{(x_i, y_i, r_i)\}_{i=1}^T\), 训练数据集;
问题
- 我们通常会建模将 item \(y\) 从一堆 items \(\{y_j\}_{j=1}^M\) 择出来的概率:
  
  \[\mathcal{P}(y|x;\theta) = \frac{e^{(s(x, y))}}{\sum_{j\in [M]} e^{s(x, y_j)}}, \]
  然后通过如下的损失函数进行训练:
  
  \[L_T(\theta) := -\frac{1}{T} \sum_{i \in [T]} r_i \cdot \log (\mathcal{P}(y_i|x_i;\theta)), \]
  当 \(M\) 很大的时候, 其实计算是很复杂的, 因为我们需要计算每一个 \(\mathcal{P}(y_j|x_i; \theta)\);
- 所以现在的一个普遍的策略就是直接拿当前的 batch 的数据作为负样本:
  
  \[\mathcal{P}_B(y|x;\theta) = \frac{e^{(s(x, y))}}{\sum_{j\in [B]} e^{s(x, y_j)}}, \]
  但这会导致流行的 items 会被经常性地惩罚, 这里有一个 bias.
解决方法
- 我们需要对 score 进行矫正, 利用 here 中的方法定义:
  
  \[s^c(x_i, y_j) = s(x_i, y_j) - \log (p_j), \]
  这里用 \(p_j\) 来表示在一个 random batch 中采样到 \(j\) 的概率;
- 这时我们用如下损失进行训练:
  
  \[L_B(\theta) := -\frac{1}{B} \sum_{i \in [B]} r_i \cdot \log (\mathcal{P}_B^c(y_i|x_i;\theta)), \]
- 为了估计 \(p_j\), 作者维护两个列表: \(A, B\), 其中 \(A[j]\) 记录了上一次采样到 \(j\) 的时刻, 而 \(B[j]\) 则表示 \(j\) 被采样的频率, 假设在下一个 step \(t'\) 中采样到了 \(j\), 则更新:
  
  \[B[j] \leftarrow (1 - \alpha) \cdot B[j] + \alpha \cdot (t' - A[j]) \\ A[j] \leftarrow t', \]
  并用
  
  \[\hat{p}_j = 1 / B[j] \]
  来表示采样概率.
- 可以证明, 当 \(t \rightarrow +\infty\) 的时候, \(\hat{p}_j \rightarrow p_j\).
相关阅读:
mysql 大数据提取
 Python 爬虫实例（11）—— 爬虫苏宁易购
 Python 爬虫实例（10）—— 四行代码实现刷博客园阅读数量
 Python MySQLdb 批量插入封装
 Python MySQLdb insert（插入）封装
 Python MySQLdb select（选择）封装
 iOS开发-iPad侧边栏Tab选项卡切换
 iOS开发-xCode6(iOS 8)中应用程序图标和启动页面设置
 iOS开发-CoreMotion框架(加速计和陀螺仪)
iOS开发-JSON解析
原文地址：https://www.cnblogs.com/MTandHJ/p/16708683.html

Samplingbiascorrected neural modeling for large corpus item recommendations

概

符号说明

问题

解决方法