• python工具——wordcloud


    生成词云

    安装wordcloud模块

    pip install -i https://pypi.tuna.tsinghua.edu.cn/simple/ wordcloud

    用重复的单个单词组成单词云

    import numpy as np
    from wordcloud import WordCloud
    
    text = "square"
    x, y = np.ogrid[:300, :300]
    
    mask = (x - 150) ** 2 + (y - 150) ** 2 > 130 ** 2
    mask = 255 * mask.astype(int)
    
    wc = WordCloud(background_color="white", repeat=True, mask=mask)
    wc.generate(text)
    wc.to_file('wc.png')

    使用一句话生成词云

    from wordcloud import WordCloud
    wc = WordCloud()    # 创建词云对象
    wc.generate('This is not the end. It is not even the beginning of the end. But it is, perhaps, the end of the beginning.')    # 生成词云
    wc.to_file('wc.png')    # 保存词云

    读取txt文件生成

    import os
    
    from os import path
    from wordcloud import WordCloud
    import matplotlib.pyplot as plt
    d = path.dirname(__file__) if "__file__" in locals() else os.getcwd()
    text = open(path.join(d, 'test.txt')).read()
    
    wordcloud = WordCloud(max_font_size=40).generate(text)
    plt.figure()
    plt.imshow(wordcloud, interpolation="bilinear")
    plt.axis("off")
    plt.show()

    生成一个词云文件需要三步:

       1、配置对象参数 

       2、加载词云文本 

       3、输出词云文件 (如果不加说明默认的图片大小为400 * 200)

    wordcloud做词频统计分为以下几个步骤:

    1、分隔:以空格分隔单词 

    2、统计 :单词出现的次数并过滤 

    3、字体:根据统计搭配相应的字号 

    4、布局

    常用参数

     eg:

    import os
    
    from os import path
    from wordcloud import WordCloud
    
    d = path.dirname(__file__) if "__file__" in locals() else os.getcwd()
    text = open(path.join(d, 'test.txt')).read()
    text=text.lower()
    wordcloud = WordCloud(background_color="white",width=800,height=660).generate(text)
    import matplotlib.pyplot as plt
    
    plt.imshow(wordcloud)
    plt.axis("off")
    plt.show()
    wc.to_file('test.png')

     

     test.txt的获取

    链接:https://pan.baidu.com/s/1zfuK9-W5tyq1P8ftlQJuJQ
    提取码:iet4

    更多参考 http://amueller.github.io/word_cloud/

        https://github.com/amueller/word_cloud

  • 相关阅读:
    Map,Multimap,Set,MultiSet,Hash_Map,Hash_Set,Share_ptr的区分
    mjpgstreamer源码分析
    S3C2410x介绍
    V4L2应用程序框架
    V4L2驱动框架
    Linux 视频设备驱动V4L2最常用的控制命令使用说明
    (转)在eclipse中查看android SDK的源代码
    [经验技巧] 利用WindowsPhone7_SDK_Full.rar_for_xp,在xp下安装sdk,部署xap软件的教程
    (收藏)智能手机开发
    Html5相关文章链接
  • 原文地址:https://www.cnblogs.com/baby123/p/13024713.html
Copyright © 2020-2023  润新知