• word2vec配置到使用


    (1)首先下载word2vec,地址:https://code.google.com/p/word2vec/,可能下载的时候有问题,google上不去,那么可以从csdn上面下载。
    解压后目录如下:
     
    w2v/
    `-- trunk
        |-- LICENSE
        |-- README.txt
        |-- compute-accuracy.c
        |-- demo-analogy.sh
        |-- demo-classes.sh
        |-- demo-phrase-accuracy.sh
        |-- demo-phrases.sh
        |-- demo-train-big-model-v1.sh
        |-- demo-word-accuracy.sh
        |-- demo-word.sh
        |-- distance.c
        |-- makefile
        |-- questions-phrases.txt
        |-- questions-words.txt
        |-- word-analogy.c
        |-- word2phrase.c
        `-- word2vec.c
    (2) 进入w2c/trunk文件夹,运行make,编辑文件。从makefile中可以看到,需要编译的文件,主要有两个word2vec.c和distance.c,编译后生成word2vec和distance。但是在编译的时候可能出现问题,参照http://blog.csdn.net/zshunmiao/article/details/15339105,可以对问题进行解决。
    makefile内容如下:
    (3)然后就可以跑个demo了,运行./demo-word.sh。
    demo-word.sh内代码如下:
    CC = gcc
    #Using -Ofast instead of -O3 might result in faster code, but is supported only by newer GCC versions
    CFLAGS = -lm -pthread -O3 -march=native -Wall -funroll-loops -Wno-unused-result
    
    all: word2vec word2phrase distance word-analogy compute-accuracy
    
    word2vec : word2vec.c
            $(CC) word2vec.c -o word2vec $(CFLAGS)
    word2phrase : word2phrase.c
            $(CC) word2phrase.c -o word2phrase $(CFLAGS)
    distance : distance.c
            $(CC) distance.c -o distance $(CFLAGS)
    word-analogy : word-analogy.c
            $(CC) word-analogy.c -o word-analogy $(CFLAGS)
    compute-accuracy : compute-accuracy.c
            $(CC) compute-accuracy.c -o compute-accuracy $(CFLAGS)
            chmod +x *.sh
    
    clean:
            rm -rf word2vec word2phrase distance word-analogy compute-accuracy

    然后输入单词,就可以计算其近义词,并按照顺序排列。
    Enter word or sentence (EXIT to break): china       
    
    Word: china  Position in vocabulary: 486
    
                                                  Word       Cosine distance
    ------------------------------------------------------------------------
                                                 japan              0.648631
                                                taiwan              0.630534
                                             manchuria              0.599535
                                                 tibet              0.583566
                                                   prc              0.560898
                                              kalmykia              0.558937
                                                xiamen              0.556037
                                                 jiang              0.553501
                                               chinese              0.547065
                                                  liao              0.543676
                                                 india              0.536273
                                                 korea              0.534758
                                                   roc              0.530741
                                              thailand              0.529334
                                                 hunan              0.527629
                                                 liang              0.527374
                                              shanghai              0.526314
                                             chongqing              0.525559
                                               nanjing              0.521342
                                                yunnan              0.518669
                                                 wuhan              0.516914
                                                  zhao              0.513246
                                              xinjiang              0.509939
                                                  tuva              0.507322
                                             guangdong              0.507288
                                                 hubei              0.505540
                                               guangxi              0.501068
                                                taipei              0.497673
                                                 macao              0.497303
                                                hainan              0.494808
                                              shandong              0.493323
                                              shenzhen              0.491871
                                              hangzhou              0.489323
                                                balhae              0.488846
                                             guangzhou              0.486907
                                                fujian              0.485473
                                              zhejiang              0.485011
                                                harbin              0.483171
  • 相关阅读:
    Android license status unknown,亲测有效
    android studio 如何升级sdk
    Java就业前景如何?前途&钱途?如何成为Java工程师?
    Java程序的编写与执行、Java新手常见的问题解决
    学习java需要掌握什么基础?如何学
    Java8中你可能不知道的一些地方之Optional实战
    P1407 [国家集训队]稳定婚姻
    P5960 【模板】差分约束算法
    P3388 【模板】割点(割顶)
    2020.8.4
  • 原文地址:https://www.cnblogs.com/xiamaogeng/p/4616173.html
Copyright © 2020-2023  润新知