用 Python 實現詞雲可視化

2021-03-06 Python那些事

（點擊上方快速關注並設置為星標，一起學Python）

作者：沂水寒城來源：

https://blog.csdn.net/Together_CZ/article/details/92764128

詞雲是一種非常漂亮的可視化展示方式，正所謂一圖勝過千言萬語，詞雲在之前的項目中我也有過很多的使用，可能對於我來說，一種很好的自我介紹方式就是詞雲吧，就像下面這樣的：

個人覺還是會比枯燥的文字語言描述性的介紹會更吸引人一點吧。

今天不是說要怎麼用詞雲來做個人介紹，而是對工作中使用到比較多的詞雲計較做了一下總結，主要是包括三個方面：

1、諸如上面的簡單形式矩形詞雲

2、基於背景圖片數據來構建詞雲數據

3、某些場景下不想使用類似上面的默認的字體顏色，這裡可以自定義詞雲的字體顏色

接下來對上面三種類型的詞雲可視化方法進行demo實現與展示，具體如下，這裡我們使用到的測試數據如下：

The Zen of Python, by Tim Peters
            Beautiful is better than ugly.
        Explicit is better than implicit.
        Simple is better than complex.
        Complex is better than complicated.
        Flat is better than nested.
        Sparse is better than dense.
        Readability counts.
        Special cases aren't special enough to break the rules.
        Although practicality beats purity.
        Errors should never pass silently.
        Unless explicitly silenced.
        In the face of ambiguity, refuse the temptation to guess.
        There should be one-- and preferably text one --obvious way to do it.
        Although that way may not be obvious at first unless you're Dutch.
        Now is better than never.
        Although never is often better than *right* now.
        If the implementation is hard to explain, it's a bad idea.
        If the implementation is easy to explain, it may be a good idea.
        Namespaces are one honking great idea -- let's do more of those!

1、簡單形式矩形詞雲實現如下：

def simpleWC1(sep=' ',back='black',freDictpath='data_fre.json',savepath='res.png'):
    '''
    詞雲可視化Demo
    '''
    try:
        with open(freDictpath) as f:
            data=f.readlines()
            data_list=[one.strip().split(sep) for one in data if one]
        fre_dict={}
        for one_list in data_list:
            fre_dict[unicode(one_list[0])]=int(one_list[1])
    except:
        fre_dict=freDictpath
    wc=WordCloud(font_path='font/simhei.ttf',#設置字體  #simhei
                background_color=back, #背景顏色
                max_words=1300,# 詞雲顯示的最大詞數
                max_font_size=120, #字體最大值
                margin=3,  #詞雲圖邊距
                width=1800,  #詞雲圖寬度
                height=800,  #詞雲圖高度
                random_state=42)
    wc.generate_from_frequencies(fre_dict)  #從詞頻字典生成詞雲
    plt.figure()
    plt.imshow(wc)
    plt.axis("off")
    wc.to_file(savepath)

圖像數據結果如下：

2、基於背景圖像數據的詞雲可視化具體實現如下：

先貼一下背景圖像：

這也是一個比較經典的圖像數據了，下面來看具體的實現：

def simpleWC2(sep=' ',back='black',backPic='a.png',freDictpath='data_fre.json',savepath='res.png'):
    '''
    詞雲可視化Demo【使用背景圖片】
    '''
    try:
        with open(freDictpath) as f:
            data=f.readlines()
            data_list=[one.strip().split(sep) for one in data if one]
        fre_dict={}
        for one_list in data_list:
            fre_dict[unicode(one_list[0])]=int(one_list[1])
    except:
        fre_dict=freDictpath
    back_coloring=imread(backPic)
    wc=WordCloud(font_path='simhei.ttf',#設置字體  #simhei
                background_color=back,max_words=1300,
                mask=back_coloring,#設置背景圖片
                max_font_size=120, #字體最大值
                margin=3,width=1800,height=800,random_state=42,)
    wc.generate_from_frequencies(fre_dict)  #從詞頻字典生成詞雲
    wc.to_file(savepath)

結果圖像數據如下：

3、自定義詞雲字體顏色的具體實現如下：

#自定義顏色列表
color_list=['#CD853F','#DC143C','#00FF7F','#FF6347','#8B008B','#00FFFF','#0000FF','#8B0000','#FF8C00',
            '#1E90FF','#00FF00','#FFD700','#008080','#008B8B','#8A2BE2','#228B22','#FA8072','#808080']

def simpleWC3(sep=' ',back='black',freDictpath='data_fre.json',savepath='res.png'):
    '''
    詞雲可視化Demo【自定義字體的顏色】
    '''
    #基於自定義顏色表構建colormap對象
    colormap=colors.ListedColormap(color_list)
    try:
        with open(freDictpath) as f:
            data=f.readlines()
            data_list=[one.strip().split(sep) for one in data if one]
        fre_dict={}
        for one_list in data_list:
            fre_dict[unicode(one_list[0])]=int(one_list[1])
    except:
        fre_dict=freDictpath
    wc=WordCloud(font_path='font/simhei.ttf',#設置字體  #simhei
                background_color=back,  #背景顏色
                max_words=1300,  #詞雲顯示的最大詞數
                max_font_size=120,  #字體最大值
                colormap=colormap,  #自定義構建colormap對象
                margin=2,width=1800,height=800,random_state=42,
                prefer_horizontal=0.5)  #無法水平放置就垂直放置
    wc.generate_from_frequencies(fre_dict)
    plt.figure()
    plt.imshow(wc)
    plt.axis("off")
    wc.to_file(savepath)

結果圖像數據如下：

上述三種方法就是我在具體工作中使用頻度最高的三種詞雲可視化展示方法了，下面貼出來完整的代碼實現，可以直接拿去跑的：

#!usr/bin/env python
#encoding:utf-8
from __future__ import division

'''
__Author__:沂水寒城
功能：詞雲的可視化模塊
'''

import os
import sys
import json
import numpy as np
from PIL import Image
from scipy.misc import imread
from matplotlib import colors
import matplotlib.pyplot as plt
from matplotlib.font_manager import FontProperties
from wordcloud import WordCloud,ImageColorGenerator,STOPWORDS

reload(sys)
sys.setdefaultencoding('utf-8')

#自定義顏色列表
color_list=['#CD853F','#DC143C','#00FF7F','#FF6347','#8B008B','#00FFFF','#0000FF','#8B0000','#FF8C00',
            '#1E90FF','#00FF00','#FFD700','#008080','#008B8B','#8A2BE2','#228B22','#FA8072','#808080']

def simpleWC1(sep=' ',back='black',freDictpath='data_fre.json',savepath='res.png'):
    '''
    詞雲可視化Demo
    '''
    try:
        with open(freDictpath) as f:
            data=f.readlines()
            data_list=[one.strip().split(sep) for one in data if one]
        fre_dict={}
        for one_list in data_list:
            fre_dict[unicode(one_list[0])]=int(one_list[1])
    except:
        fre_dict=freDictpath
    wc=WordCloud(font_path='font/simhei.ttf',#設置字體  #simhei
                background_color=back, #背景顏色
                max_words=1300,# 詞雲顯示的最大詞數
                max_font_size=120, #字體最大值
                margin=3,  #詞雲圖邊距
                width=1800,  #詞雲圖寬度
                height=800,  #詞雲圖高度
                random_state=42)
    wc.generate_from_frequencies(fre_dict)  #從詞頻字典生成詞雲
    plt.figure()
    plt.imshow(wc)
    plt.axis("off")
    wc.to_file(savepath)

def simpleWC2(sep=' ',back='black',backPic='a.png',freDictpath='data_fre.json',savepath='res.png'):
    '''
    詞雲可視化Demo【使用背景圖片】
    '''
    try:
        with open(freDictpath) as f:
            data=f.readlines()
            data_list=[one.strip().split(sep) for one in data if one]
        fre_dict={}
        for one_list in data_list:
            fre_dict[unicode(one_list[0])]=int(one_list[1])
    except:
        fre_dict=freDictpath
    back_coloring=imread(backPic)
    wc=WordCloud(font_path='simhei.ttf',#設置字體  #simhei
                background_color=back,max_words=1300,
                mask=back_coloring,#設置背景圖片
                max_font_size=120, #字體最大值
                margin=3,width=1800,height=800,random_state=42,)
    wc.generate_from_frequencies(fre_dict)  #從詞頻字典生成詞雲
    wc.to_file(savepath)

def simpleWC3(sep=' ',back='black',freDictpath='data_fre.json',savepath='res.png'):
    '''
    詞雲可視化Demo【自定義字體的顏色】
    '''
    #基於自定義顏色表構建colormap對象
    colormap=colors.ListedColormap(color_list)
    try:
        with open(freDictpath) as f:
            data=f.readlines()
            data_list=[one.strip().split(sep) for one in data if one]
        fre_dict={}
        for one_list in data_list:
            fre_dict[unicode(one_list[0])]=int(one_list[1])
    except:
        fre_dict=freDictpath
    wc=WordCloud(font_path='font/simhei.ttf',#設置字體  #simhei
                background_color=back,  #背景顏色
                max_words=1300,  #詞雲顯示的最大詞數
                max_font_size=120,  #字體最大值
                colormap=colormap,  #自定義構建colormap對象
                margin=2,width=1800,height=800,random_state=42,
                prefer_horizontal=0.5)  #無法水平放置就垂直放置
    wc.generate_from_frequencies(fre_dict)
    plt.figure()
    plt.imshow(wc)
    plt.axis("off")
    wc.to_file(savepath)

if __name__ == '__main__':
    text="""
        The Zen of Python, by Tim Peters
        Beautiful is better than ugly.
        Explicit is better than implicit.
        Simple is better than complex.
        Complex is better than complicated.
        Flat is better than nested.
        Sparse is better than dense.
        Readability counts.
        Special cases aren't special enough to break the rules.
        Although practicality beats purity.
        Errors should never pass silently.
        Unless explicitly silenced.
        In the face of ambiguity, refuse the temptation to guess.
        There should be one-- and preferably text one --obvious way to do it.
        Although that way may not be obvious at first unless you're Dutch.
        Now is better than never.
        Although never is often better than *right* now.
        If the implementation is hard to explain, it's a bad idea.
        If the implementation is easy to explain, it may be a good idea.
        Namespaces are one honking great idea -- let's do more of those!
        """
    word_list=text.split()
    fre_dict={}
    for one in word_list:
        if one in fre_dict:
            fre_dict[one]+=1
        else:
            fre_dict[one]=1
    simpleWC1(sep=' ',back='black',freDictpath=fre_dict,savepath='simpleWC1.png')
    simpleWC2(sep=' ',back='black',backPic='backPic/A.png',freDictpath=fre_dict,savepath='simpleWC2.png')
    simpleWC3(sep=' ',back='black',freDictpath=fre_dict,savepath='simpleWC3.png')

（完）

看完本文有收穫？請轉發分享給更多人

關注「Python那些事」，做全棧開發工程師

用 Python 實現詞雲可視化

相關焦點

數據統計可視化——python生成詞雲

數據可視化 | 用Python生成個性化詞雲

一種用Python生成詞雲

彈幕、詞雲、面積圖,最全文字可視化教程來啦!

軟體推薦 7款免費的詞雲可視化工具,圖表控沒有理由拒絕

給閱讀充分的想像空間和娛樂趣味 | python詞雲

【Python教程】用Python進行數據可視化

使用python生成詞雲

使用Jieba庫分分鐘實現高端大氣的詞雲

手把手教你怎麼做一個詞雲

Python爬取腳本之家生成詞雲

用Python進行數據可視化的10種方法

從零開始教你用 Python 做詞雲

用Python構建和可視化決策樹

python生成詞雲時,文件名與庫名重出現的錯誤提示

想用Python做數據可視化?先邁過這個「坎」

拿來就用能的Python詞雲圖代碼|wordcloud生成詞雲詳解

Python如何生成詞雲(詳解)

【python】Tkinter可視化窗口(三)

Python的可視化工具概述