• 1、AutoFM:通过概率自动编码器的高效分解机模型


    一、题目:

    AutoFM: an efficient factorization machine model via probabilistic auto-encoders

    AutoFM:通过概率自动编码器的高效分解机模型

    二、摘要

    Studies show that conventional factorization machines (FMs) have low performance in capturing both local and global
    structures of user–item correlation simultaneously. Recently, deep neural networks (DNNs) have been applied to improve
    FMs. However, DNNs increase the complexity of the training process. Moreover, DNN-based FMs ignore the integration
    of neighborhood-based approaches. An effificient method called factorization machine model via probabilistic auto-en
    coders (AutoFM) is proposed to resolve this issue in the present study. The proposed AutoFM can extract non-trivial and
    local structures characteristics from user–user/item–item co-occurrence pairs by integrating a low-complexity probabilistic
    auto-encoder. Furthermore, it supports both explicit and implicit feedback datasets. Extensive experiments on four real
    world datasets demonstrate the effectiveness of the proposed method. The results show that the AutoFM outperforms the
    current state-of-the-art methods in rating prediction tasks. Compared with the DNN-based FM models, the proposed
    AutoFM model improves the item ranking at least 1.16%  4.37%.
    研究表明,传统的分解机 (FM) 在同时捕获用户-项目相关性的局部和全局结构方面性能较低。最近,深度神经网络 (DNN) 已被应用于改进 FM。然而,DNN 增加了训练过程的复杂性。此外,基于 DNN 的 FM 忽略了基于邻域的方法的集成。在本研究中,提出了一种称为通过概率自动编码器 (AutoFM) 的分解机模型的有效方法来解决此问题。所提出的 AutoFM 可以通过集成低复杂度的概率自动编码器,从用户-用户/项目-项目共现对中提取非平凡和局部结构特征。此外,它支持显式和隐式反馈数据集。对四个真实世界数据集的大量实验证明了所提出方法的有效性。结果表明,AutoFM 在评级预测任务中优于当前最先进的方法。与基于 DNN 的 FM 模型相比,所提出的 AutoFM 模型将项目排名提高了至少 1.16% ~4.37%。

    1、介绍

    With the explosive growth of information on the internet, personalized recommendation systems play an increasingly indispensable role in overcoming information overload and promoting e-commerce. However, the existing user–item interactions (e.g., ratings, clicks, and purchases) and model capacity limit the effectiveness of recommendations [1]. In order to solve rating prediction issues or perform item ranking in the recommendation system. For example, studies showed that the ability to leverage high-order reasoning is a practical scheme to resolve the sparsity problem. Moreover, the factorization machine (FM) [2, 3] is a widely adopted and effective technique to exploit any auxiliary information from user/item ID to predict user behavior.

    随着互联网信息的爆炸式增长,个性化推荐系统在克服信息过载和
    促进电子商务。 然而,现有的用户-项目交互(例如,评分、点击和购买)和模型容量限制了推荐的有效性 [1]。 为了解决评分预测问题或在推荐系统中进行项目排名。 例如,研究表明,利用高阶推理的能力是解决稀疏问题的实用方案。 此外,分解机 (FM) [2, 3] 是一种广泛采用且有效的技术,可利用来自用户/项目 ID 的任何辅助信息来预测用户行为。

    The FM model uses the interaction of feature pairs as an inner product of latent vectors between features. Further investigations revealed that the FM model has reasonable performance in diverse applications, and auspicious results can be obtained accordingly. FM imitates the same expression capability as many factorization models, such as SVD?? [4] and nearest neighbor models [3, 5]. It is worth noting that only high-order information of input features is considered in the FM, limiting its capability to deal with nonlinear problems. Consequently, despite outstanding performance of the FM model in linear problems, it cannot capture the underlying structure of more complex data in nonlinear problems [6]. In the classification task as a test case, not all issues can be linearly separable after quadratic mapping [7].  In this case, although FM can model highorder feature interactions in principle, only second-order feature interactions are usually considered in practice due to the high complexity [8].

    FM 模型使用特征对的相互作用作为特征之间潜在向量的内积。进一步的研究表明,FM 模型在各种应用中具有合理的性能,并且可以相应地获得吉祥的结果。 FM 模仿了很多分解模型的表达能力,比如 SVD?? [4] 和最近邻模型 [3, 5]。值得注意的是,FM中只考虑输入特征的高阶信息,限制了其处理非线性问题的能力。因此,尽管 FM 模型在线性问题中表现出色,但它无法捕捉非线性问题中更复杂数据的基础结构 [6]。在作为测试用例的分类任务中,并非所有问题都可以在二次映射后线性可分 [7]。在这种情况下,虽然 FM 原则上可以对高阶特征交互进行建模,但由于高复杂性,在实践中通常只考虑二阶特征交互[8]。

    Recently, deep neural networks (DNNs) have been proposed to learn sophisticated feature interactions and prediction and ranking. In this regard, several DNN-based FM models have been proposed so far. Guo et al. [8] have proposed a DeepFM model by sharing feature embedding between FM and DNN. Moreover, NFM [6] is an enhanced FM model dealing with nonlinear and higher-order feature interactions. In this scheme, nonlinear layers are placed above the bi-interaction layer to deepen the expressive power of the FM model. However, it is an enormous challenge to train the FM-based variants in deep structures [8]. Although the existing deep architectures contain latent factor models, the nonlinear integration of neighborhoodbased methods usually is ignored [1].

    最近,已经提出深度神经网络 (DNN) 来学习复杂的特征交互以及预测和排名。 在这方面,迄今为止已经提出了几种基于 DNN 的 FM 模型。 郭等人。 [8] 通过在 FM 和 DNN 之间共享特征嵌入,提出了 DeepFM 模型。 此外,NFM [6] 是处理非线性和高阶特征交互的增强型 FM 模型。 在这个方案中,非线性层被放置在双向交互层之上,以加深 FM 模型的表达能力。 然而,在深层结构中训练基于 FM 的变体是一个巨大的挑战 [8]。 尽管现有的深层架构包含潜在因子模型,但通常会忽略基于邻域的方法的非线性集成 [1]。

    Based on the foregoing discussions, it is inferred that conventional FM models and the corresponding deep learning variant of FM models have three main drawbacks: Firstly, these models describe interactions linearly. Subsequently, a nonlinear and inherently complex structure of real data cannot be accurately captured [6]. Secondly, in [3], although FM has the same expression ability as matrix factorization model and nearest neighbor model, it is unclear how to integrate them to FM models. Thirdly, applying the pre-training strategy in the deep learning variants of the FM model (e.g., NFM), encounters three limitations: 1) In the pre-training strategy, it is an enormous challenge to prevent over-fitting during the training
    process. 2) Nonlinear integration of neighborhood-based methods is ignored [1]. 3) They cannot make full use of data information, and the embedding parameters might be overly affected by the FM model [8]. To address these disadvantages, it is intended to propose an efficient method in the present study. In the proposed method, called Factorization Machine Model via Probabilistic Auto-encoders (AutoFM), advantages of a global structure composed of the FM and neighbor-hood local components and probabilistic auto-encoders are integrated. Performed investigations [9, 10] demonstrated that the auto-encoder is an effective method to find a powerful feature representation from the input. Inspired by NECF [11], the proposed model generates the item embedding from the user–item data via a probabilistic auto-encoder. AutoFM acquires non-trivial and local structures characteristics from user–user/item–item co-occurrence pairs by integrating probabilistic autoencoders. Then the performance of the proposed AutoFM is evaluated in detail on four real-world datasets. The
    AutoFM is expected to outperform the conventional methods in rating prediction and item ranking tasks.

    基于上述讨论,可以推断出传统的 FM 模型和相应的 FM 模型的深度学习变体具有三个主要缺点:首先,这些模型线性地描述了交互。随后,无法准确捕获真实数据的非线性和固有复杂结构[6]。其次,在[3]中,虽然FM与矩阵分解模型和最近邻模型具有相同的表达能力,但不清楚如何将它们整合到FM模型中。第三,在 FM 模型的深度学习变体(例如 NFM)中应用预训练策略,遇到三个限制:1)在预训练策略中,防止过拟合是一个巨大的挑战。训练过程。 2)忽略了基于邻域的方法的非线性集成[1]。 3)不能充分利用数据信息,嵌入参数可能会受到FM模型的过度影响[8]。为了解决这些缺点,本研究旨在提出一种有效的方法。在所提出的方法中,称为通过概率自动编码器的分解机器模型(AutoFM),集成了由 FM 和邻域局部组件和概率自动编码器组成的全局结构的优点。进行的调查 [9, 10] 表明,自动编码器是一种从输入中找到强大特征表示的有效方法。受 NECF [11] 的启发,所提出的模型通过概率自动编码器从用户-项目数据生成项目嵌入。 AutoFM 通过集成概率自动编码器从用户-用户/项目-项目共现对中获取非平凡和局部结构特征。然后在四个真实世界的数据集上详细评估所提出的 AutoFM 的性能。 AutoFM 有望在评级预测和项目排序任务中优于传统方法。

    The main contributions of the present article can be summarized as follows:

    本文的主要贡献可以总结如下:

    1、Despite the deep learning variant of FM, it does not take full advantage of user–item data. The proposed probabilistic auto-encoder framework can better capture local structures and low-dimensional features between items and users from the user–item rating matrix. Moreover, performing the complexity analysis indicates that the proposed model can be extended to large-scale data sets. This originates from the linear correlation between the total computational time and the obtained values for the rating matrix.

    1、尽管 FM 具有深度学习变体,但它并没有充分利用用户-项目数据。 所提出的概率自动编码器框架可以更好地从用户-项目评分矩阵中捕获项目和用户之间的局部结构和低维特征。 此外,执行复杂性分析表明所提出的模型可以扩展到大规模数据集。 这源于总计算时间和获得的评分矩阵值之间的线性相关性。

    2、 The performance of the proposed AutoFM is evaluated on four benchmark datasets. Accordingly, a reasonable improvement in rating prediction and item ranking tasks are obtained compared with conventional models. Moreover, the experimental results on a benchmark movie dataset show that the proposed method significantly improves the accuracy of the system and achieves satisfactory performance in terms of coverage and diversity of recommendation lists.

    2、所提出的 AutoFM 的性能在四个基准数据集上进行了评估。 因此,与传统模型相比,评级预测和项目排序任务获得了合理的改进。 此外,在基准电影数据集上的实验结果表明,所提出的方法显着提高了系统的准确性,并在推荐列表的覆盖范围和多样性方面取得了令人满意的性能。

    The remainder of the present article is organized as follows:
    Preliminary definitions and models are described in Sect 2.
    Then, the proposed methodology is discussed in detail in Sect 3.
    Furthermore, experimental results and discussions are provided in Sect 4. Finally, conclusions and main achievements are presented in Sect 5.

    本文的其余部分组织如下:
    初步定义和模型在第 2 节中描述。
    然后,在第 3 节中详细讨论了所提出的方法。
    此外,第 4 节提供了实验结果和讨论。 最后,第 5 节给出了结论和主要成果。

    2、初步的

    In this section, the research problem is initially defined, and then the existing solutions for explicit and implicit feedback are discussed. Finally, the basic principles of skipgram with negative sampling (SGNS) and FM models are briefly discussed.

    在本节中,首先定义研究问题,然后讨论显式和隐式反馈的现有解决方案。 最后,简要讨论了负采样跳跃语法 (SGNS) 和 FM 模型的基本原理。

    2.1、问题定义

    设N个用户和M个项目用U和I,分别地。因此,相应的评级/互动矩阵可表示为Y。而且,Yui表示用户u对项目i的偏好。

     

     A recommendation system with explicit feedback can generally be expressed as a rating prediction tasks. In the system, the missing value of the rating matrix is estimated, and then the items are recommended to the user according to the estimated scores. However, the recommendation system with implicit feedback is often expressed as an interactive prediction system. Since the implicit feedback is usually binary or discrete, solving such a binary classi-fication problem does not help to sort and recommend the items. Therefore, in practical applications, the implicit feedback value of the interactive prediction system is defined as a continuous value at 0  1 ½  so that it can match with a rating prediction problem.

    具有显式反馈的推荐系统通常可以表示为评级预测任务。 在系统中,估计评分矩阵的缺失值,然后根据估计的分数向用户推荐物品。 然而,具有隐式反馈的推荐系统通常表示为交互式预测系统。 由于隐式反馈通常是二元或离散的,解决这样的二元分类问题无助于对项目进行排序和推荐。 因此,在实际应用中,交互预测系统的隐式反馈值被定义为0~1的连续值,以匹配评分预测问题。

    2.2、带负采样的 Skip-gram

    The word2vec (W2V) model has been successfully applied in diverse tasks of natural language processing [13, 14]. The main purpose of the W2V model is to learn the vector representation of words to capture the correlation between the surrounding words. It is worth noting that the word embedding hypothesis is that words in the same context are
    similar, and then words are embedded in a low-dimensional continuous space to capture this similarity. The skip-gram with negative sampling (SGNS) in the W2V model is applied to predict the words around it for a given the word in a training set. Based on a list of ‘‘liked/disliked’’ for each user, ‘‘liked/disliked’’ pairs of items that appear together can be generated no matter how many times each item is‘‘liked/disliked’’. In particular, for one item in a given sequence of items, all other items are treated as their context. Then the SGNS can be mapped to recommend systems [15–18].

    word2vec (W2V) 模型已成功应用于自然语言处理的各种任务 [13, 14]。 W2V模型的主要目的是学习词的向量表示,以捕捉周围词之间的相关性。 值得注意的是,词嵌入假设是相同上下文中的词是相似的,然后将词嵌入到一个低维连续空间中来捕捉这种相似性。 W2V 模型中的负采样跳跃语法 (SGNS) 用于预测训练集中给定单词的周围单词。 基于每个用户的“喜欢/不喜欢”列表,无论每个项目被“喜欢/不喜欢”多少次,都可以生成“喜欢/不喜欢”的项目对,它们一起出现。 特别是,对于给定项目序列中的一个项目,所有其他项目都被视为它们的上下文。 然后可以将 SGNS 映射到推荐系统 [15-18]。

     2.3、FM

    The idea of the factorization machine (FM) [2, 19] is to learn one polynomial kernels by expressing higher-order terms as the low-dimensional inner product of latent factor vectors. Studies show that the latent factor model has superior characteristics to simulate the factor decomposition model. FMs combine the universality and applicability of feature engineering and apply the factorization model to simulate and estimate the interaction between different variables. Furthermore, factorization machines can be well adapted to the sparse data environment with linear time complexity and versatility for various predictive tasks [6].

    分解机 (FM) [2, 19] 的思想是通过将高阶项表示为潜在因子向量的低维内积来学习一个多项式核。 研究表明,潜在因子模型具有模拟因子分解模型的优越特性。 FMs 结合特征工程的普遍性和适用性,应用分解模型来模拟和估计不同变量之间的相互作用。 此外,分解机可以很好地适应稀疏数据环境,具有线性时间复杂度和各种预测任务的多功能性 [6]。

     

     

    三、框架提出

    In this section, a new framework factorization machine model using probabilistic auto-encoders (AutoFM), and suitable for both explicit and implicit feedback is introduced.

    在这一节中,介绍了一种新的框架分解机模型,该模型使用概率自动编码器(AutoFM),适用于显式和隐式反馈。

    3.1 General framework

    3.1 总体框架

    Figure 1 illustrates the architecture of the proposed AutoFM model. To increase the clarity, a part of linear regression, which can be irrelevant, is removed. Figure 1 indicates that the AutoFM model consists of three consecutive stages: (1) Generating the user–user/item–item cooccurrence pairs through the user–item rating matrix. (2) Learning the latent vector of the user Uauto and item Uauto via probabilistic auto-encoders and then initializing the user and item latent vector for the AutoFM model. (3) The FM model has superior characteristics, including high accuracy and low complexity. Moreover, it is applied to add content such as context information for the expansion [3, 20, 21]. The latent vector of user Uauto and item Iauto is integrated to one-hot encoding of users and items to realize the learning and prediction performance of FMs. These stages are described separately in the following.

    图 1 说明了所提出的 AutoFM 模型的架构。 为了增加清晰度,删除了可能不相关的一部分线性回归。 图 1 表明 AutoFM 模型由三个连续的阶段组成:(1)通过用户-项目评分矩阵生成用户-用户/项目-项目共现对。 (2) 通过概率自动编码器学习用户 Uauto 和项目 Uauto 的潜在向量,然后为 AutoFM 模型初始化用户和项目潜在向量。 (3) FM 模型具有优越的特性,包括高精度和低复杂度。 此外,它用于添加内容,例如扩展[3,20,21]的上下文信息。 将用户 Uauto 和项目 Iauto 的潜在向量集成到用户和项目的 one-hot 编码中,以实现 FM 的学习和预测性能。 下面分别介绍这些阶段。

     This stage is inspired by the word embedding algorithms [15, 18], which corresponds to each user and item to a single word. The word embedding’s main purpose is to learn vector representations of words to capture correlations with surrounding words. The word embedding assumes that words appearing in the same context are similar. Subsequently, words are embedded in a low-dimensional continuous space to capture this similarity. If an item/user is considered as a word, then the user–user/item–item co-occurrence pairs with the same rating score for the same item/user so that the word embedding can be mapped to the recommender systems. The co-occurrence pairs of items with the same rating values for the same users are in one sentence. Similarly, the co-occurrence pairs of users with the same rating values for the same item are in another sentence. Figure 2 illustrates rating matrix examples for an explicit feedback dataset. It is worth noting that real value ratings are usually used and classified as ordinal ratings on an explicit feedback dataset

    这个阶段受到词嵌入算法[15, 18]的启发,该算法将每个用户和项目对应到一个词。词嵌入的主要目的是学习词的向量表示以捕获与周围词的相关性。词嵌入假设出现在相同上下文中的词是相似的。随后,单词被嵌入到一个低维连续空间中以捕捉这种相似性。如果一个项目/用户被认为是一个词,那么用户-用户/项目-项目共现对具有相同的项目/用户的相同评分,这样词嵌入就可以映射到推荐系统。相同用户的具有相同评分值的项目的共现对在一个句子中。类似地,对同一项目具有相同评分值的用户的共现对在另一个句子中。图 2 说明了显式反馈数据集的评级矩阵示例。值得注意的是,通常在显式反馈数据集上使用实值评级并将其归类为有序评级

     

     

     

     Unlike explicit feedback datasets, there are only positive samples in implicit feedback datasets. However, since both negative and positive samples are required in the co-occurrence pairs of the proposed model, it is an enormous challenge to implement the proposed model on an implicit feedback dataset. It is simply assumed that the missing value is a negative feedback with equal probability to resolve this problem. Then some negative examples are selected with equal weight from the missing value. In the present article, a fixed number of negative samples is extracted for positive instances from interactions that have never been observed.

    与显式反馈数据集不同,隐式反馈数据集中只有正样本。 然而,由于所提出模型的共现对中需要负样本和正样本,因此在隐式反馈数据集上实施所提出的模型是一个巨大的挑战。 简单地假设缺失值是一个负反馈,解决这个问题的概率相等。 然后从缺失值中选择一些具有相同权重的负样本。 在本文中,从从未观察到的交互中为正实例提取固定数量的负样本。

  • 相关阅读:
    【翻译】ASP.NET MVC深度接触:ASP.NET MVC请求生命周期
    水木年华亲笔签名《ASP.NET第一步》等你来拿!!
    《C#与.NET 3.0高级程序设计(特别版)》横空出世
    是什么让你萌发了跳槽的念头?
    (原创)无废话C#设计模式之十八:Command
    (原创)无废话C#设计模式之十六:State
    (原创)无废话C#设计模式之二十:Mediator
    推荐文章索引
    技术图书非常难写
    【翻译】创建IQUERYABLE提供器系列文章
  • 原文地址:https://www.cnblogs.com/zhangxianrong/p/14978761.html
Copyright © 2020-2023  润新知