• gensim Load embeddings


    gensim package

    
    from gensim.models.keyedvectors import KeyedVectors
    
    twitter_embedding_path = 'twitter_embedding.emb'
    twitter_vocab_path = 'twitter_model.vocab'
    foursquare_embedding_path = 'foursquare_embedding.emb'
    foursquare_vocab_path = 'foursquare_model.vocab'
    
    # load the embedding vector using gensim
    x_vectors = KeyedVectors.load_word2vec_format(foursquare_embedding_path, binary=False, fvocab=foursquare_vocab_path)
    y_vectors = KeyedVectors.load_word2vec_format(twitter_embedding_path, binary=False, fvocab=twitter_vocab_path)
    
    print('type(x_vectors)', type(x_vectors))
    print('type(x_vectors.vocab)', type(x_vectors.vocab))
    print('type(x_vectors.vocab.keys())', type(x_vectors.vocab.keys()))
    

    Content in 'twitter_embedding.emb':

    5120 64
    BarackObama -0.079930 0.106491 -0.075812 -0.026447 ...
    mashable 0.046692 -0.038019 -0.055519 ...
    ...

    Content in 'twitter_model.vocab':

    BarackObama 3475971
    mashable 2668606
    JonahLupton 2515250
    instagram 2359886
    TheEllenShow 2292545
    cnnbrk 2157283
    nytimes 2141588
    foursquare 2021352

    ...

    Write the embeddings into file

    for writing the embeddings into file
    ref code patch:

    embedding_path = data_path + 'embedding/'
    # ....
    modelX = word2vec.Word2Vec(walkList_x, negative=10, sg=1, hs=0, size=100, window=4, min_count=0, workers=15, iter=30)
    # save the embedding results
    modelX.wv.save_word2vec_format(embedding_path + 'twitter.emb', fvocab=embedding_path + 'twitter.vocab')
    
  • 相关阅读:
    由于媒体16摘要天
    vim note (2)
    JEECG 什么是商业版本的功能最近添加的好友?
    图解linux启动过程
    Error: unrecognized flag -version
    基于RDP瘦客户机协议的简要说明
    Android在网络上分析获取图片(支持bmp格式)
    [React] Validate Custom React Component Props with PropTypes
    [Python] Wikipedia Crawler
    [Python] Python list slice syntax fun
  • 原文地址:https://www.cnblogs.com/sonictl/p/11220479.html
Copyright © 2020-2023  润新知