• Text Classification


    Text Classification

    For purpose of word embedding extrinsic evaluation, especially downstream task.

    Some concepts are informed from 复旦大学NLP组

    Statistical-Based Method

    Logistic Regression

    Statistics perspective based text classification described as follow[Li Y 2015].

    We use Tencent news titles as our text classification dataset. A total of 8,826 titles of four categories (society, entertainment, healthcare, and military) are extracted. The lengths of titles range from 10 to 20 words. We train ℓ2-regularized logistic regression classifiers using the LIBLINEAR package (Fan et al, 2008) with the learned embeddings.

    Also described as follow[kiros 2015].

    On all datasets, we simply extract skip-thought vectors and train a logistic regression classifier on top.

    [Yan Song 2018] also applied this kind of method.

    This document classification experiment is performed in a conventional way as that in previous studies [Kiela et al., 2015; Kiros et al., 2015]. For all the documents in training and test datasets, we first construct document level representations by averaging the embeddings from all words in a given document. A logistic regression classifier is then trained on top of the resulted document level representations on the training set and evaluated on the test set.

    Linear SVM

    It described as follow[Kiela 2015]

    we first construct document-level representations by summing the vector representations for all words in a given document. After setting aside a small development set for tuning the hyperparameters of the supervised algorithm, we train a support vector machine (SVM) classifier with a linear kernel and evaluate document topic classification accuracy using ten-fold cross-validation.

    Bibliography

    复旦大学NLP组. NLP-Beginner. https://github.com/FudanNLP/nlp-beginner

    [Li Y. 2015] Li Y, Li W, Sun F, et al. Component-Enhanced Chinese Character Embeddings[J]. empirical methods in natural language processing, 2015: 829-834.

    [Kiros 2015] Kiros, Ryan, et al. "Skip-Thought Vectors." Advances in Neural Information Processing Systems 28(2015).

    [Yan Song 2018] Song, Yan et al. “Joint Learning Embeddings for Chinese Words and their Components via Ladder Structured Networks.” IJCAI (2018).

    [Kiela 2015] Kiela, Douwe et al. “Specializing Word Embeddings for Similarity or Relatedness.” EMNLP (2015).

  • 相关阅读:
    selenium2截图ScreenShot的使用
    selenium2断言类Assert的使用
    selenium2中的TestNg注解和数据驱动的简介及使用
    bash函数定义/使用/传参…
    bash字符串操作
    bash数组操作-定义/初始化/赋值…
    bash实例-参数/函数/统计IP
    01.AutoMapper 之约定(Conventions)
    00.AutoMapper 之入门指南(Getting Started Guide)
    AutoMapper
  • 原文地址:https://www.cnblogs.com/fengyubo/p/11118431.html
Copyright © 2020-2023  润新知