小白也能看懂的seaborn入門示例

2021-01-14 AI派

seaborn一共有5個大類21種圖，分別是：

relplot() 關係類圖表的接口，其實是下面兩種圖的集成，通過指定kind參數可以畫出下面的兩種圖

scatterplot() 散點圖

lineplot() 折線圖

stripplot() 分類散點圖

swarmplot() 能夠顯示分布密度的分類散點圖

boxplot() 箱圖

violinplot() 小提琴圖

boxenplot() 增強箱圖

pointplot() 點圖

barplot() 條形圖

countplot() 計數圖

jointplot() 雙變量關係圖

pairplot() 變量關係組圖

distplot() 直方圖，質量估計圖

kdeplot() 核函數密度估計圖

rugplot() 將數組中的數據點繪製為軸上的數據

lmplot() 回歸模型圖

regplot() 線性回歸圖

residplot() 線性回歸殘差圖

heatmap() 熱力圖

clustermap() 聚集圖

下面展現一下以上涉及的大部分繪圖示例,所涉及參數均有注釋,(可左右滑動代碼段)在數據集符合要求的情況下,我們大多可以用一行代碼實現繪圖功能,相信看完示例後你就能初步掌握seaborn畫圖,如果對繪圖要求更高的話,可以查詢seaborn手冊更改所畫圖類型的其他默認參數

%matplotlib inline

import seaborn as sns
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

set()重置默認的主題參數：
有五種主題風格，它們分別是darkgrid, whitegrid, dark, white, ticks。默認的主題darkgrid。

sns.set(style="ticks")

df = sns.load_dataset("anscombe")
df.head()

seaborn內置了不少樣例數據，為dataframe類型，如果要查看數據，可以使用類似df.head()命令查看

lmplot(回歸圖)

lmplot是用來繪製回歸圖的，通過lmplot我們可以直觀地總覽數據的內在關係。

sns.lmplot(x="x", y="y", col="dataset", hue="dataset", data=df,
col_wrap=2, ci=None, palette="muted", height=4,
scatter_kws={"s": 50, "alpha": 1})

sns.set()

iris = sns.load_dataset("iris")

g = sns.lmplot(x="sepal_length", y="sepal_width", hue="species",
truncate=True, height=5, data=iris)

g.set_axis_labels("Sepal length (mm)", "Sepal width (mm)")

sns.set(style="darkgrid")

df = sns.load_dataset("titanic")

pal = dict(male="#6495ED", female="#F08080")

g = sns.lmplot(x="age", y="survived", col="sex", hue="sex", data=df,
palette=pal, y_jitter=.02, logistic=True)

g.set(xlim=(0, 80), ylim=(-.05, 1.05))

kdeplot(核密度估計圖)

核密度估計(kernel density estimation)是在概率論中用來估計未知的密度函數，屬於非參數檢驗方法之一。通過核密度估計圖可以比較直觀的看出數據樣本本身的分布特徵。具體用法如下：

sns.set(style="dark")

rs = np.random.RandomState(50)

f, axes = plt.subplots(3, 3, figsize=(9, 9), sharex=True, sharey=True)
for ax, s in zip(axes.flat, np.linspace(0, 3, 10)):

    cmap = sns.cubehelix_palette(start=s, light=1, as_cmap=True)

    x, y = rs.randn(2, 50)
    sns.kdeplot(x, y, cmap=cmap, shade=True, cut=5, ax=ax)
    ax.set(xlim=(-3, 3), ylim=(-3, 3))
f.tight_layout()

FacetGrid

是一個繪製多個圖表（以網格形式顯示）的接口。
步驟：
1、實例化對象
2、map，映射到具體的 seaborn 圖表類型
3、添加圖例

sns.set(style="darkgrid")
tips = sns.load_dataset("tips")

g = sns.FacetGrid(tips, row="sex", col="time", margin_titles=True)
bins = np.linspace(0, 60, 13)
g.map(plt.hist, "total_bill", color="steelblue", bins=bins)

distplot(單變量分布直方圖)

在seaborn中想要對單變量分布進行快速了解最方便的就是使用distplot()函數，默認情況下它將繪製一個直方圖，並且可以同時畫出核密度估計(KDE)。

sns.set(style="white", palette="muted", color_codes=True)
rs = np.random.RandomState(10)

f, axes = plt.subplots(2, 2, figsize=(7, 7), sharex=True)
sns.despine(left=True)

d = rs.normal(size=100)

sns.distplot(d, kde=False, color="b", ax=axes[0, 0])

sns.distplot(d, hist=False, rug=True, color="r", ax=axes[0, 1])

sns.distplot(d, hist=False, color="g", kde_kws={"shade": True}, ax=axes[1, 0])

sns.distplot(d, color="m", ax=axes[1, 1])
plt.setp(axes, yticks=[])
plt.tight_layout()

lineplot

seaborn裡的lineplot函數所傳數據必須為一個pandas數組.

sns.set(style="darkgrid")

fmri = sns.load_dataset("fmri")
sns.lineplot(x="timepoint", y="signal",
hue="region", style="event",
data=fmri)

relplot

這是一個圖形級別的函數，它用散點圖和線圖兩種常用的手段來表現統計關係。

sns.set(style="ticks")
dots = sns.load_dataset("dots")
palette = dict(zip(dots.coherence.unique(),
                   sns.color_palette("rocket_r", 6)))

sns.relplot(x="time", y="firing_rate",
            hue="coherence", size="choice", col="align",
            size_order=["T1", "T2"], palette=palette,
            height=5, aspect=.75, facet_kws=dict(sharex=False),
            kind="line", legend="full", data=dots)

boxplot

箱形圖（Box-plot）又稱為盒須圖、盒式圖或箱線圖，是一種用作顯示一組數據分散情況資料的統計圖。它能顯示出一組數據的最大值、最小值、中位數及上下四分位數。

sns.set(style="ticks", palette="pastel")

tips = sns.load_dataset("tips")

sns.boxplot(x="day", y="total_bill",
hue="smoker", palette=["m", "g"],
data=tips)

sns.despine(offset=10, trim=True)

violinplot

violinplot與boxplot扮演類似的角色，它顯示了定量數據在一個（或多個）分類變量的多個層次上的分布，這些分布可以進行比較。不像箱形圖中所有繪圖組件都對應於實際數據點，小提琴繪圖以基礎分布的核密度估計為特徵。

sns.set()
rs = np.random.RandomState(0)
n, p = 40, 8
d = rs.normal(0, 2, (n, p))
d += np.log(np.arange(1, p + 1)) * -5 + 10
pal = sns.cubehelix_palette(p, rot=-.5, dark=.3)

sns.violinplot(data=d, palette=pal, inner="points")

sns.set(style="whitegrid", palette="pastel", color_codes=True)
tips = sns.load_dataset("tips")
sns.violinplot(x="day", y="total_bill", hue="smoker",
               split=True, inner="quart",
               palette={"Yes": "y", "No": "b"},
               data=tips)

sns.despine(left=True)

heatmap熱力圖

利用熱力圖可以看數據表裡多個特徵兩兩的相似度。

sns.set()
flights_long = sns.load_dataset("flights")
flights = flights_long.pivot("month", "year", "passengers")

f, ax = plt.subplots(figsize=(9, 6))
sns.heatmap(flights, annot=True, fmt="d", linewidths=.5, ax=ax)

jointplot

用於2個變量的畫圖,將兩個變量的聯合分布形態可視化出來往往會很有用。在seaborn中，最簡單的實現方式是使用jointplot()函數，它會生成多個面板，不僅展示了兩個變量之間的關係，也在兩個坐標軸上分別展示了每個變量的分布。

sns.set(style="white")
rs = np.random.RandomState(5)
mean = [0, 0]
cov = [(1, .5), (.5, 1)]
x1, x2 = rs.multivariate_normal(mean, cov, 500).T
x1 = pd.Series(x1, name="$X_1$")
x2 = pd.Series(x2, name="$X_2$")

g = sns.jointplot(x1, x2, kind="kde", height=7, space=0)

HexBin圖
直方圖的雙變量類似物被稱為「hexbin」圖，因為它顯示了落在六邊形倉內的觀測數。該圖適用於較大的數據集。

sns.set(style="ticks")
rs = np.random.RandomState(11)
x = rs.gamma(2, size=1000)
y = -.5 * x + rs.normal(size=1000)
sns.jointplot(x, y, kind="hex", color="#4CB391")

sns.set(style="darkgrid")
tips = sns.load_dataset("tips")
g = sns.jointplot("total_bill", "tip", data=tips, kind="reg",
xlim=(0, 60), ylim=(0, 12), color="m", height=7)

catplot

分類圖表的接口，通過指定kind參數可以畫出下面的八種圖

stripplot() 分類散點圖
swarmplot() 能夠顯示分布密度的分類散點圖
boxplot() 箱圖
violinplot() 小提琴圖
boxenplot() 增強箱圖
pointplot() 點圖
barplot() 條形圖
countplot() 計數圖

sns.set(style="whitegrid")
titanic = sns.load_dataset("titanic")
g = sns.catplot(x="class", y="survived", hue="sex", data=titanic,
height=6, kind="bar", palette="muted")

g.despine(left=True)
g.set_ylabels("survival probability")

sns.set(style="whitegrid")
df = sns.load_dataset("exercise")
g = sns.catplot(x="time", y="pulse", hue="kind", col="diet",
capsize=.2, palette="YlGnBu_d", height=6, aspect=.75,
kind="point", data=df)
g.despine(left=True)

pointplot

點圖代表散點圖位置的數值變量的中心趨勢估計，並使用誤差線提供關於該估計的不確定性的一些指示。點圖可能比條形圖更有用於聚焦一個或多個分類變量的不同級別之間的比較。他們尤其善於表現交互作用：一個分類變量的層次之間的關係如何在第二個分類變量的層次之間變化。連接來自相同色調等級的每個點的線允許交互作用通過斜率的差異進行判斷，這比對幾組點或條的高度比較容易。

sns.set(style="whitegrid")
iris = sns.load_dataset("iris")
iris = pd.melt(iris, "species", var_name="measurement")
f, ax = plt.subplots()
sns.despine(bottom=True, left=True)

sns.stripplot(x="value", y="measurement", hue="species",
              data=iris, dodge=True, jitter=True,
              alpha=.25, zorder=1)

sns.pointplot(x="value", y="measurement", hue="species",
              data=iris, dodge=.532, join=False, palette="dark",
              markers="d", scale=.75, ci=None)

handles, labels = ax.get_legend_handles_labels()
ax.legend(handles[3:], labels[3:], title="species",
          handletextpad=0, columnspacing=1,
          loc="lower right", ncol=3, frameon=True)

boxenplot（增強箱圖）

sns.set(style="whitegrid")
diamonds = sns.load_dataset("diamonds")
clarity_ranking = ["I1", "SI2", "SI1", "VS2", "VS1", "VVS2", "VVS1", "IF"]

sns.boxenplot(x="clarity", y="carat",
color="b", order=clarity_ranking,
scale="linear", data=diamonds)

Scatterplot（散點圖）

sns.set()
planets = sns.load_dataset("planets")
cmap = sns.cubehelix_palette(rot=-.2, as_cmap=True)
ax = sns.scatterplot(x="distance", y="orbital_period",
                     hue="year", size="mass",
                     palette=cmap, sizes=(10, 200),
                     data=planets)

PairGrid

用於繪製數據集中成對關係的子圖網格。

sns.set(style="white")
df = sns.load_dataset("iris")
g = sns.PairGrid(df, diag_sharey=False)
g.map_lower(sns.kdeplot)
g.map_upper(sns.scatterplot)
g.map_diag(sns.kdeplot, lw=3)

residplot

線性回歸殘差圖

sns.set(style="whitegrid")

rs = np.random.RandomState(7)
x = rs.normal(2, 1, 75)
y = 2 + 1.5 * x + rs.normal(0, 2, 75)

sns.residplot(x, y, lowess=True, color="g")

swarmplot

能夠顯示分布密度的分類散點圖

sns.set(style="whitegrid", palette="muted")

iris = sns.load_dataset("iris")
iris = pd.melt(iris, "species", var_name="measurement")
sns.swarmplot(x="measurement", y="value", hue="species",
palette=["r", "c", "y"], data=iris)

pairplot

變量關係組圖

sns.set(style="ticks")
df = sns.load_dataset("iris")
sns.pairplot(df, hue="species")

clustermap

聚集圖

sns.set()

df = sns.load_dataset("brain_networks", header=[0, 1, 2], index_col=0)

used_networks = [1, 5, 6, 7, 8, 12, 13, 17]
used_columns = (df.columns.get_level_values("network")
                          .astype(int)
                          .isin(used_networks))
df = df.loc[:, used_columns]

network_pal = sns.husl_palette(8, s=.45)
network_lut = dict(zip(map(str, used_networks), network_pal))

networks = df.columns.get_level_values("network")
network_colors = pd.Series(networks, index=df.columns).map(network_lut)

sns.clustermap(df.corr(), center=0, cmap="vlag",
               row_colors=network_colors, col_colors=network_colors,
               linewidths=.75, figsize=(13, 13))

我們針對這個專欄專門成立了學習群，大家如果有問題歡迎大家可以在我們的機器學習專欄群裡來討論。還沒有加入的同學可以掃描下方的微信二維碼，添加微信好友，之後統一邀請你加入交流群。添加好友時一定要備註：機器學習。

小白也能看懂的seaborn入門示例

相關焦點

燈光設計之讓小白也能看懂DIALUX照度計算報告

數據分析入門學習指南,零基礎小白都能輕鬆看懂

用Seaborn繪製圖表

建築小白入門系列:如何看懂住宅小區總平面規劃圖

韋編|五分鐘學會Seaborn常用圖表繪製

小白的天文學量子力學入門讀物推薦

2022年考研數學一小白入門第一講

14個Seaborn數據可視化圖

教大家如何快速看懂電氣原理圖,值得收藏學習!

新手小白必讀 1分鐘看懂HiFi設備上的接口

python數據可視化(一)seaborn介紹及繪圖風格設置

技能分享|Python數據可視化利器:Seaborn使用方法(一)

從Python小白到大牛,要走的路這裡都有(中級篇)

掌握Seaborn的三分之一:使用relplot進行統計繪圖

大話卷積神經網絡CNN,小白也能看懂的深度學習算法教程,全程乾貨...

Seaborn可視化-核密度分布圖 seaborn.kdeplot

一篇讀懂中古包丨給小白的入門避坑指南

Python可視化23|seaborn.distplot單變量分布圖(直方圖|核密度圖)

玩石頭怎樣才能快速入門?送給玩石頭「小白」四點建議

Seaborn的6個簡單技巧