• python使用deepwalk模型算节点相似度


    待整理
    github:https://github.com/prateekjoshi565/DeepWalk
    方法:
    https://blog.csdn.net/gdh756462786/article/details/79108665/

    一、直接依赖requirements.txt会有问题,

    ImportError: cannot import name 'Vocab' from 'gensim.models.word2vec' 

    需要把gensim的版本改成3.8.3

     

    二、具体过程

    下载源代码
    https://github.com/phanein/deepwalk

    数据集的定义
    http://leitang.net/social_dimension.html

    核心代码

    walks = graph.build_deepwalk_corpus(G, num_paths=args.number_walks, path_length=args.walk_length, alpha=0, rand=random.Random(args.seed))
    
    print("Training...")
    
    model = Word2Vec(walks, size=args.representation_size, window=args.window_size, min_count=0, workers=args.workers)


    安装

    cd deepwalk-master
    pip install -r requirements.txt
    python setup.py install


    复现试验结果
    1. BlogCatalog dataset

    生成Embedding

    deepwalk --format mat --input example_graphs/blogcatalog.mat --max-memory-data-size 0 --number-walks 80 --representation-size 128 --walk-length 40 --window-size 10 --workers 1 --output example_graphs/blogcatalog.embeddings


    评估

    python example_graphs/scoring.py --emb example_graphs/blogcatalog.embeddings --network example_graphs/blogcatalog.mat --num-shuffle 10 --all


    2. Karate dataset

    生成Embedding

    --format默认.adjlist文件

    deepwalk --input example_graphs/karate.adjlist --max-memory-data-size 0 --number-walks 80 --representation-size 128 --walk-length 40 --window-size 10 --workers 1 --output example_graphs/karate.embeddings


    评估

    --network需要.mat文件

    option如下:

    usage: scoring [-h] --emb EMB --network NETWORK
    [--adj-matrix-name ADJ_MATRIX_NAME]
    [--label-matrix-name LABEL_MATRIX_NAME]
    [--num-shuffles NUM_SHUFFLES] [--all]

    optional arguments:
    -h, --help show this help message and exit
    --emb EMB Embeddings file (default: None)
    --network NETWORK A .mat file containing the adjacency matrix and node
    labels of the input network. (default: None)
    --adj-matrix-name ADJ_MATRIX_NAME
    Variable name of the adjacency matrix inside the .mat
    file. (default: network)
    --label-matrix-name LABEL_MATRIX_NAME
    Variable name of the labels matrix inside the .mat
    file. (default: group)
    --num-shuffles NUM_SHUFFLES
    Number of shuffles. (default: 2)
    --all The embeddings are evaluated on all training percents
    from 10 to 90 when this flag is set to true. By
    default, only training percents of 10, 50 and 90 are
    used. (default: False)





    参考:https://blog.csdn.net/YizhuJiao/article/details/81095346

    github:https://github.com/phanein/deepwalk

  • 相关阅读:
    [BZOJ4444][SCOI2015]国旗计划(倍增)
    [BZOJ4423][AMPPZ2013]Bytehattan(对偶图+并查集)
    [BZOJ4416][SHOI2013]阶乘字符串(子集DP)
    [BZOJ3203][SDOI2013]保护出题人(凸包+三分)
    [BZOJ4026]dC Loves Number Theory(线段树)
    51nod部分容斥题解
    [CodeVS4438]YJQ Runs Upstairs
    [HDU4906]Our happy ending
    牛客网NOIP赛前集训营-提高组(第四场)游记
    [BJWC2011]元素
  • 原文地址:https://www.cnblogs.com/StarZhai/p/15545387.html
Copyright © 2020-2023  润新知