• jieba—parallel


    jieba 并行处理进行测试,注意:并行分词仅支持默认分词器 jieba.dt 和 jieba.posseg.dt

    import sys
    import time
    import jieba
    
    jieba.enable_parallel()
    
    #url = sys.argv[1]
    content = open("/ssd/ailab-dataset/THUCNewsSubset/cnews.train.txt","rb").read()
    t1 = time.time()
    words = "/ ".join(jieba.cut(content))
    
    t2 = time.time()
    tm_cost = t2-t1
    
    log_f = open("1.log","wb")
    log_f.write(words.encode('utf-8'))
    
    print('speed %s bytes/second' % (len(content)/tm_cost))

     测试结果:

    #把jieba.enable_parallel()注释掉了
    [root@n6 jieba-parallel-test]# python test.py      
    Building prefix dict from the default dictionary ...
    Loading model from cache /tmp/jieba.cache
    Loading model cost 0.289 seconds.
    Prefix dict has been built succesfully.
    speed 259919.622884 bytes/second
    
    #加上了jieba.enable_parallel()
    [root@n6 jieba-parallel-test]# vi test.py
    [root@n6 jieba-parallel-test]# vi test.py
    [root@n6 jieba-parallel-test]# python test.py
    Building prefix dict from the default dictionary ...
    Loading model from cache /tmp/jieba.cache
    Loading model cost 0.263 seconds.
    Prefix dict has been built succesfully.
    speed 2215307.40079 bytes/second

     加了并行,快很多哟!!!

  • 相关阅读:
    贵有恒
    二叉树的中序遍历
    001.3或5的倍数
    静态成员的语法总结及应用-单例模式
    力扣42题(接雨水)
    算法笔记之二分查找
    素数筛算法之寻找每个数的最小素因子
    素数筛的算法感悟
    一维数组的逆序存放问题
    关于c++入门的几个基本代码之求和
  • 原文地址:https://www.cnblogs.com/helloworld0604/p/9633806.html
Copyright © 2020-2023  润新知