Python第三方库wordcloud（词云）快速入门与进阶

前言：

笔主开发环境：Python3+Windows
推荐初学者使用Anaconda来搭建Python环境，这样很方便而且能提高学习速度与效率。
简介：
wordcloud是Python中的一个小巧的词云生成器。
github:https://github.com/amueller/word_cloud
官网:https://amueller.github.io/word_cloud/
下载：
1——使用conda下载（前提是安装了Anaconda，推荐这种方法）：

conda install -c conda-forge wordcloud

2——使用pip命令(笔主开发环境为windows，第一次按这种方法安装，会出现错误，按照网上的解决办法一直没解决)：

pip install wordcloud

实例

1–入门案例

#!/usr/bin/env python
"""
Minimal Example
===============

使用默认参数根据美国宪法生成方形的词云
"""

from os import path
from wordcloud import WordCloud

d = path.dirname(__file__)

# 读取整个文本
text = open(path.join(d, 'constitution.txt')).read()

# 生成一个词云图像
wordcloud = WordCloud().generate(text)

# matplotlib的方式展示生成的词云图像
import matplotlib.pyplot as plt
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")

#max_font_size设定生成词云中的文字最大大小
#width,height,margin可以设置图片属性
# generate 可以对全部文本进行自动分词,但是他对中文支持不好
wordcloud = WordCloud(max_font_size=66).generate(text)
plt.figure()
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()

# pil方式展示生成的词云图像（如果你没有matplotlib）
# image = wordcloud.to_image()
# image.show()

入门案例

2–使用蒙版图像可以生成任意形状的wordcloud。

#!/usr/bin/env python
"""
Masked wordcloud
================

使用蒙版图像可以生成任意形状的wordcloud。
"""

from os import path
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt

from wordcloud import WordCloud, STOPWORDS

d = path.dirname(__file__)

# 读取整个文本.
text = open(path.join(d, 'alice.txt')).read()

#读取图片（图片来源：http://www.stencilry.org/stencils/movies/alice%20in%20wonderland/255fk.jpg）
alice_mask = np.array(Image.open(path.join(d, "alice_mask.png")))

stopwords = set(STOPWORDS)
stopwords.add("said")
#设置词云的一些属性
wc = WordCloud(background_color="white", max_words=2000, mask=alice_mask,
               stopwords=stopwords)
# 生成词云
wc.generate(text)

#保存到本地
wc.to_file(path.join(d, "alice.png"))

#展示
plt.imshow(wc, interpolation='bilinear')
plt.axis("off")
plt.figure()
plt.imshow(alice_mask, cmap=plt.cm.gray, interpolation='bilinear')
plt.axis("off")
plt.show()

3–着色

#!/usr/bin/env python
"""
使用自定义颜色
===================

使用重新着色方法和自定义着色功能。
"""

import numpy as np
from PIL import Image
from os import path
import matplotlib.pyplot as plt
import random

from wordcloud import WordCloud, STOPWORDS


def grey_color_func(word, font_size, position, orientation, random_state=None,
                    **kwargs):
    return "hsl(0, 0%%, %d%%)" % random.randint(60, 100)

d = path.dirname(__file__)

# 读取图片（图片来源：http://www.stencilry.org/stencils/movies/star%20wars/storm-trooper.gif）
mask = np.array(Image.open(path.join(d, "stormtrooper_mask.png")))

# 文字来源：“新希望”电影剧本（网址：http://www.imsdb.com/scripts/Star-Wars-A-New-Hope.html）
text = open(path.join(d, 'a_new_hope.txt')).read()

# 预处理一点点文本
text = text.replace("HAN", "Han")
text = text.replace("LUKE'S", "Luke")

# 添加电影剧本特定的停用词
stopwords = set(STOPWORDS)
stopwords.add("int")
stopwords.add("ext")

wc = WordCloud(max_words=1000, mask=mask, stopwords=stopwords, margin=10,
               random_state=1).generate(text)
# 存储默认的彩色图像
default_colors = wc.to_array()
plt.title("Custom colors")
plt.imshow(wc.recolor(color_func=grey_color_func, random_state=3),
           interpolation="bilinear")
wc.to_file("a_new_hope.png")
plt.axis("off")
plt.figure()
plt.title("Default colors")
plt.imshow(default_colors, interpolation="bilinear")
plt.axis("off")
plt.show()

#!/usr/bin/env python
"""
Image-colored wordcloud
=======================
您可以在ImageColorGenerator中实现使用基于图像的着色策略对文字云进行着色，它使用由源图像中的单词占用的区域的平均颜色。

"""

from os import path
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt

from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator

d = path.dirname(__file__)

# 读取整个文本
text = open(path.join(d, 'alice.txt')).read()

# 读取蒙板/彩色图像（图片是从http://jirkavinse.deviantart.com/art/quot-Real-Life-quot-Alice-282261010下载的）
alice_coloring = np.array(Image.open(path.join(d, "alice_color.png")))
stopwords = set(STOPWORDS)
stopwords.add("said")

wc = WordCloud(background_color="white", max_words=2000, mask=alice_coloring,
               stopwords=stopwords, max_font_size=40, random_state=42)
# 生成词云
wc.generate(text)

# 从图像创建着色
image_colors = ImageColorGenerator(alice_coloring)

# 显示
plt.imshow(wc, interpolation="bilinear")
plt.axis("off") #不显示坐标尺寸
plt.figure()
# 重新着色词云并显示
# 我们也可以直接在构造函数中给使用：color_func=image_colors 
plt.imshow(wc.recolor(color_func=image_colors), interpolation="bilinear")
plt.axis("off") #不显示坐标尺寸
plt.figure()
plt.imshow(alice_coloring, cmap=plt.cm.gray, interpolation="bilinear")
plt.axis("off") #不显示坐标尺寸
plt.show()#一次绘制三张图

关于着色的另一个例子：
colored_by_group.py

4–表情

 #!/usr/bin/env python
"""
表情实例
===============
一个简单的例子，显示如何包含表情符号。 请注意，这个例子似乎不适用于OS X（苹果系统），但是确实如此
在Ubuntu中正常工作
包含表情符号有3个重要步骤：
1) 使用io.open而不是内置的open来读取文本输入。 这确保它被加载为UTF-8
2) 重写词云使用的正则表达式以将文本解析为单词。 默认表达式只会匹配ascii的单词
3) 将默认字体覆盖为支持表情符号的东西。 包含的Symbola字体包括黑色和白色大多数表情符号的白色轮廓。 目前PIL / Pillow库存在的问题似乎可以预防
它在OS X上运行正常（https://github.com/python-pillow/Pillow/issues/1774）。
如果你有问题，试试在Ubuntu上运行
"""
import io
import string
from os import path
from wordcloud import WordCloud

d = path.dirname(__file__)

#使用io.open将文件正确加载为UTF-8非常重要
text = io.open(path.join(d, 'happy-emoji.txt')).read()

# the regex used to detect words is a combination of normal words, ascii art, and emojis
# 2+ consecutive letters (also include apostrophes), e.x It's
normal_word = r"(?:w[w']+)"
# 2+ consecutive punctuations, e.x. :)
ascii_art = r"(?:[{punctuation}][{punctuation}]+)".format(punctuation=string.punctuation)
# a single character that is not alpha_numeric or other ascii printable
emoji = r"(?:[^s])(?<![w{ascii_printable}])".format(ascii_printable=string.printable)
regexp = r"{normal_word}|{ascii_art}|{emoji}".format(normal_word=normal_word, ascii_art=ascii_art,
                                                     emoji=emoji)

# 生成一个词云图片
# Symbola字体包含大多数表情符号
font_path = path.join(d, 'fonts', 'Symbola', 'Symbola.ttf')
wordcloud = WordCloud(font_path=font_path, regexp=regexp).generate(text)

# 采用matplotlib方式：展示生成的图片
import matplotlib.pyplot as plt
plt.imshow(wordcloud)
plt.axis("off")
plt.show()

表情实例

以上所有例子可以到我的github上下载（持续更新Python第三方库使用demo以及常见python爬虫，python玩微信等内容）：
https://github.com/Snailclimb/Python/tree/master/PythonDemo/wordcloud

相关阅读:
uva10285 Longest Run on a Snowboard(DP)
typecho 0.8 营销引擎
 新浪博客营销插件
 忍者X3备份说明
 QQ空间、说说抓取引擎
 yiqicms发布插件的使用
 SHOPEX v4.85 发布插件
 ecshop2.73插件使用帮助
 Destoon V5 发布插件
 Wordpress3.52营销引擎
原文地址：https://www.cnblogs.com/snailclimb/p/9086435.html